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Preface 


The 80386 microprocessor is the first 32-bit member of the popular Intel 8086 family. 
This family’s applications include communications equipment, instrumentation, 
graphics, CAD/CAM, test equipment, industrial control, process control, business 
equipment, and military systems. Family members are also the CPUs in the IBM PC 
and PS/2 series, PC compatibles, and PC clones. Its early introduction, many associated 
devices, and extensive hardware and software support have helped the 8086 family 
dominate the microprocessor market. 

This book is a general introduction to the 80386 for programmers, engineers, tech- 
nicians, systems analysts, teachers, students, and personal computer users. It can also 
serve as a primer for computer and data processing professionals and instructors and 
as a Supplemental text for courses in computer organization, microprocessors, personal 
computers, and computer applications. It emphasizes the 80386’s key features and the 
differences between it and the earlier 8088, 8086, and 80286 chips. Examples illustrate 
the use of these features in typical applications and point out problems and pitfalls. 

The book assumes that readers are familiar with the 8086 family and with a program- 
ming language such as BASIC, C, FORTRAN, or Pascal. Readers should also have 
some background in computer architecture and assembly language programming, 
derived either from course work or from practical experience. 

The book is organized as follows: 

Chapter 1 is a general overview of the 80386. It emphasizes new features and con- 
cepts, including support for virtual memory, multitasking, multiuser systems, operat- 
ing systems, and high-level languages. It also compares the 80386 with previous 
processors, describes typical applications, and lists new features that the next genera- 
tion of processors may have. 

Chapter 2 presents the 80386’s instruction set from the user’s point of view. It em- 
phasizes the major features of the processor’s architecture and addressing modes. It 
then focuses on frequently used instructions before dealing with the entire instruction 
set. It also discusses the 80386’s improved performance, the effects of instructions on 
flags, differences between the 80386 and earlier processors, and assembler directives. 
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Chapter 3 covers the basics of practical 80386 assembly language programming. It 
Starts with simple programs and proceeds through bit manipulation, shifting, decision 
making, array manipulation, table lookup, character manipulation, code conversion, 
multiple-precision arithmetic, and data structure manipulation. An example section 
contains complete programs. Final sections discuss parameter passing methods, ways 
to make programs run faster, and common programming errors. 

Chapter 4 describes I/O. It covers I/O addressing, I/O instructions, programmable 
I/O chips, interrupts, and direct memory access (DMA). It includes discussions of the 
popular 8250 serial interface, 8255 parallel interface, 8259 interrupt controller, and 
8237 DMA controller. 

Chapter 5 deals with the 80386’s memory management facilities. It first describes 
the processor’s operating modes. It then covers segmentation, paging, memory protec- 
tion, the creation of descriptors, privileged instructions, and the initialization of 
memory management systems. 

Chapter 6 explains the 80386’s task management features. It discusses task descrip- 
tors, task state segments, privilege levels, task switching, task linking, and task address 
spaces. It also covers I/O privilege levels, I/O permission bit maps, and the initializa- 
tion of tasking systems. 

Chapter 7 deals with the 80386’s exceptions and debugging features. It describes 
the sources of exceptions, error codes, and exception conditions. A final section dis- 
cusses the debug registers. 

Chapter 8 presents the 80386’s hardware features. It includes a general overview of 
the signal structure and bus operations. It also covers numerical coprocessors, memory 
interfacing, and cache memory. 

The appendixes contain the instruction sets of the 80386 and the 80287 and 80387 
numerical coprocessors. They also contain summaries of the status flags and descrip- 
tor formats, as well as summaries of the differences between the 80386 and earlier 
processors. 

This book should give the reader enough understanding of the 80386 processor to 
tackle many projects. These could include the design and development of embedded 
systems, the rewriting of programs for 80386-based computers, and the creation of new 
utilities for the 80386. This book should also alert readers to likely new features in 
80386-based computers and systems. It should thus help readers prepare for the in- 
evitable transition from 16-bit processors to 32-bit processors in both personal com- 
puters and embedded systems. 

Many people and organizations contributed to the writing of this book. Steve Guty 
of Bantam Books was the acquisitions editor. Claudette Moore, Gary Masters, and 
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Mike Halvorson of Microsoft Press provided many editorial suggestions. Phil Barrett, 
Paul Butzi, and (especially) Hans Spiller of Microsoft were willing to share their ex- 
periences in programming the 80386. George Fahouris, Lisa Figlioli, Clif Purkiser, and 
Doug Rick of Intel provided materials and encouragement. Tracey McAllister of Intel, 
Rosemary Morrissey of IBM, and others provided photographs of actual systems. 
Microsoft and Compaq Computers loaned me equipment and manuals. I also profited 
from conversations with Chris Ruff of Sorrento Valley Associates, Irvin Stafford of 
Unisys Corporation, and Dave Flower of Compaq Computer. Two anonymous 
reviewers offered many last-minute additions, clarifications, and corrections. After a 
bnief siege of grumbling, I made most of the changes. I certainly appreciate their ef- 
forts, particularly the Bantam reviewer's. Last but not least, I want to acknowledge the 
support and encouragement of my wife, Donna, and my daughters, Elizabeth and Stacy. 

This book is dedicated to two of my teachers at Roosevelt High School in Seattle, 
Washington: Elizabeth Clark and Ann Grodal. They showed me the importance of 
words and ideas and challenged me to strive for clarity and insight. 
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Introducing 
_the 80386 


Why, a four-year old child 

could understand this report. 

Run out and get me a four-year old child. 

| can’t make head or tail out of it. 
Groucho Marx 
“Duck Soup” 


The 80386 microprocessor is a major advance over the 8088, 8086, and 80286 devices. 
Its 32-bit architecture allows it to handle twice as much data at a time as can 16-bit 
processors. Computers based on it are thus much more powerful than such systems as 
the IBM PC and PC clones (based on the 8088), the AT&T 6300 and IBM Personal 
System/2 Model 30 (based on the 8086), and the IBM PC AT, AT clones, and the IBM 
Personal System/2 Models 50 and 60 (based on the 80286). Yet the 80386 can run MS- 
DOS programs and all others written for those earlier processors. The 80386 thus brings 
the power of a Superminicomputer (such as the DEC VAX line) or a mainframe (such 
as IBM’s 360/370 lines) to the chip level. An 80386-based computer, such as the IBM 
Personal System/2 Model 80 (Figure 1-1), can do more than computers of the 1970s 
that cost hundreds of thousands or even millions of dollars. 


80386 Programming Guide 





Raa are Sm s a 
Onn niea crc tans 
‘ i ene 


Figure 1-1 

The IBM Personal System 2/Model 80, an 80386-based computer. 
Photo courtesy of IBM Corporation, Information Systems Group, 
Rye Brook, New York. 





KEY FEATURES 





The 80386 1s more than just an expanded 8088 or 80286. Among its key features are: 

- Memory capacity of 4 gigabytes (Gb or G). A gigabyte is 1,024 
megabytes (approximately 1 billion bytes; see Table 1-1). This is 256 
limes the capacity of the 80286 and 4096 times that of the 8088. Figure 
1-2 shows how the memory capacity of Intel processors has increased 
with time. The slope of the curve is impressive, even with a logarithmic 
vertical scale. The 8080’s 64K capacity is just barely above the horizon- 
tal axis. 
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Table 1-1 
Memory Units and Their Meanings 





Unit Number of Bytes Prefix Meaning 
Kilobyte (Kb) 1,024 Thousand 
Megabyte (Mb) 1,048,576 Million 
Gigabyte (Gb) 1,073,741,824 Billion 
Terabyte (Tb) 1,099,511,627,776 Trillion 

Table 1-2 


Four Gigabytes of Memory in Various Units 





Unit Number Required 
2-Mb board 2,048 
640-Kb board 6,450 
256-Kb board 16,384 
J-megabit chips 32,768 
256-kilobit chips 131,072 


Four gigabytes is enough memory to remember the entire U.S. national debt, al- 
though not enough to do anything about it. Besides, gigabyte is anew word witha gieat 
sound. Itis amajor addition to one’s technica] vocabulary, now that most people know 
about kilobytes and megabytes. 

Of course, although gigabytes sound good, they are not terribly practical at present. 
Four gigabytes would occupy 2000 2-Mb boards or over 6000 640K boards (see Table 
1-2). To house thatmuch memory, your computer would need a huge chassis, thousands 
of empty slots, and a gargantuan power supply! But some ancient pioneers may remem- 
ber back to the bygone 1970s when megabytes were equally impractical, and 64K was 
more memory than any reasonable person would ever need. For those who like to look 
ahead, the next stage (sce Table 1-1) is terabytes. 

e Ability to run 8086 programs in their own environments. The 80386 can 
switch back and forth between an 8086 mode and its own native operat- 
ing mode. An 80386-based computer can thus run popular MS-DOS 
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Figure 1-2 
Processor memory capacity vs. time for Intel CPUs. 


software without giving up access to more advanced features. This 1s a 
big improvement over the 80286 processor, which cannot run MS-DOS 
programs in its native (protected) operating mode. Many 80286 features, 
such as its 16 Mb of addressable memory, are therefore inaccessible to 
MS-DOS users. OS/2, which runs on 80286- and 80386-based com- 
puters, gives access to 80286 features but not to those new to the 80386. 
- Support for operating systems that run several tasks at once (called mul- 
titasking). The 80286 and 80386 both have special instructions, data 
structures, and other features intended for such systems. Multitasking is 
important 1n personal computers and in such applications as computer- 
aided design (CAD), computer-aided manufacturing (CAM), computer- 
aided engineering (CAE), robotics, artificial intelligence, industrial 
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control, process control, military and aerospace systems, instrumenta- 
tion, and workstations. 

¢ Ability to address single units or blocks of memory as large as 4 Gb. We 
call such addressable units of program or data memory segments. On the 
80386, even large programs can fit in a single segment. This avoids the 
extra programming and machine time required to check for segment 
boundaries and move from one segment to another. 

« Support for virtual memory. That is, the 80386 can readily address more 
memory than is physically present. The operating system can move data 
and programs to and from disk as needed. The programmer does not have 
to manage physical memory systems that may vary with computer model 
or be expanded as time passes. 

¢ Special on-chip hardware for shifting, multiplication and division, and 
address generation. Thisnew hardware makes instructions execute faster. 
For example, a device called a barrel shifter can shift up to 64 bit posi- 
tions ina single clock cycle. Thus all shifts take the same amount of time, 
regardless of the number of positions involved. Long shifts are common 
in graphics, communications, compiling, image processing, and string 
processing. The overlapped execution of address calculations (such as 
indexing) is anothermajorreason forthe 80386’s improved performance. 
Even the most complex addressing mode now takes only one extra clock 
cycle, as compared to up to twelve on earlier processors. 

The 80386 has other new features as well. For instance, it has debug registers for 
use in program development. It also has instructions for bit manipulation, bounds 
checking, and module handling. Improved hardware and instruction facilities greatly 
speed up arithmetic operations, shifts, and addressing. Extemally, the 80386 has anew, 
more powerful 32-bit floating point unit, the 80387 numeric data processor (see Ap- 
pendixes G and I). 

The 80386 has a more generalized architecture than its predecessors. A frequent 
criticism of the 8086 and 80286 processors is their dedication of many registers to 
specific purposes, such as accumulators, indexes, and base addresses. The 80386 al- 
lows users to ignore much of this specialization, although they can take advantage of 
it if they wish. More consistency 1s particularly helpful to compiler writers, whose 
programs must translate high-level computer languages into low-level executable code 
(machine language). The more consistent the architecture, the easier it 1s for a com- 
piler to do automatic translation without any special analysis or concern for context. 
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VIRTUAL 8086 MODE 
A major improvement in the 80386 is its new virtual 8086 (V86) mode. This mode 
gives the 80386 its ability to run both 8086 programs and new programs requiring ad- 
vanced features. Like the 80286, the 80386 has two main operating modes: 

Real mode, in which it acts like an 8086. That is, it behaves like a 16-bit processor 
with access to 1 Mb of memory. | 

Protected (virtual) mode, in which it has access to all its features and facilities. That 
is, 1t behaves like a 32-bit processor with access to 4 Gb of memory. 

The problem is the difficulty of switching between real and protected modes. The 
two conflict, and only one can be active at a time. Unfortunately, as we mentioned, 
MS-DOS does not run in the protected mode on the 80286 (or the 80386). Thus we 
must give up either MS-DOS software or protected mode features. 

The 80386 adds an intermediate stage called virtual 5086 mode. Itis an option within 
the protected mode, selected by a flag. Virtual 8086 mode does not conflict with the 
full protected mode. In it, the processor acts like an 8086. However, some 80386 fea- 
tures remain active, under the control of a special program called a virtual 5056 
monitor. The result is that 8086 programs can coexist with programs using advanced 
80386 features and running in protected mode. Virtual 8086 mode thus provides a 
bridge between MS-DOS software and new 80386 features. 

In fact, in the virtual mode, an 80386 can behave like several 8086s, each with its 
Own operating system, programs, and memory areas. Users may think they have 8086s 
for their own private use, even though they are actually sharing an 80386-based 
machine. We call such a simulation a virtual machine. 

A key point here is that in V86 mode, the 80386 executes programs just like an 8086. 
It does not emulate the 8086. That 1s, it docs not translate 8086 instructions into its own 
native mode. Emulation would make an 80386 run slower than an 8086. In fact, the 
80386 runs faster because of its higher clock speed, more extensive pipelining, and 
extra anthmetic hardware. 


PIPELINED ARCHITECTURE 





Figure 1-3 shows the functional units inside the 80386 microprocessor. The Key point 
is that all units operate at the same time. That is, they do their jobs simultaneously on 
different operands. We call this approach pipelining, since it works like a pipeline that 
moves things continuously rather than in discrete units. 
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Figure 1-3 
80386 functional units. 


You may compare a computer pipeline with an automobile assembly line. The line 
does not finish one car and then start another. Instead, it works on many Cars at once. 
One car may be having its doors welded, another its motor inserted, a third its windows 
attached, and a fourth its paint applied, all at the same time. Clearly, production is much 
higher if several workstations operate simultaneously. 
An 80386 has six distinct units or workstations, as Figure 1-3 shows. They are, start- 
ing at the far right and moving clock wise: 


¢ Bus interface unit 

¢ Code prefetch unit 

¢ Instruction decode unit 

¢ (Instruction) execution unit 
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- Segmentation or segment unit 

« Paging or page unit 

The units’ functions are: 

« The bus interface unit reads instructions from memory (called instruc- 
tion fetch) and transfers data to or from memory and I/O devices. It is 
thus the processor's extemal traffic manager. 

- The code prefetch and instruction decode units save the instructions and 
determine their meanings (a process called decoding). Both units have 
storage places (called gueues) that hold instructions in the proper order 
until the next unit needs them. The prefetch unit has a 16-byte queue, and 
the decode unit has a three-instruction queue. 

- The execution unit performs the operations specified by the instructions. 

It does arithmetic, logic, shifting, and other functions. It also manages 
the on-chip temporary storage (registers). 

- The segmentation and paging units together translate the memory addres- 
ses to which programs refer (called logical addresses) into actual or 
physical addresses. Chapter 5 explains the translation in detail. 

All units can (and usually do) operate simultaneously. For example, the 80386 can 
execute an instruction, Compute amemory address, decode an instruction, transfer data 
to or from memory, and do other jobs at one time. Of course, like the automobile as- 
sembly line, the 80386’s pipeline works at full speed only if all inputs are available 
when units need them. 

At top speed, the 80386 is working on several instructions at the same time. Then it 
need not wait for one part of an operation to finish before starting another. Of course, 
it takes a while to fill the pipeline at first. 

Figure 1-4 compares pipelined operation to the sequential operation of early proces- 
sors such as the Intel 8080. The leftmost part of the figure shows how the pipeline fills 
initially. Instruction 1 works its way through the empty units much as it does in the se- 
quential processor at the top. However, while the pipelined processor is executing in- 
struction 1, itis also fetching instructions 3 and 4 and decoding instruction 2. At this 
point, the pipeline is full, and all units are busy. By the end of the interval, the pipelined 
processor has executed four instructions as compared to the sequential processor’s two. 
Furthermore, the pipelined processor has fetched instructions 5 and 6 and decoded in- 
Struction 5. 

A pipelined processor requires special programming techniques for optimal perfor- 
mance. Slowdowns occur whenever the pipeline must be filled. The usual reason is a 
transfer of control that forces the processor to clear its queues and start executing in- 
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Figure 1-4 
Instruction pipelining. 


structions somewhere else in memory. The programmer must minimize the number of 
jumps to keep this from happening too often. Software theorists will be glad to hear 
that the 80386 makes GOT Os highly undesirable. They arenot only harmful to program 
structure, but they also slow down the hardware. 


TYPICAL APPLICATIONS 


What applications need the 80386’s capabilities? Leading areas include: 
¢ Personal computers 
¢ CAD/CAM/CAE (computcr-aided-desi gn/manufacturing/enginccring) 
systems 
¢ Robotics 
¢ Artificial intelligence 





Figure 1-5 
The Compaq Deskpro 386. Photo courtesy of Compaq Computer 
Corporation, Houston, Texas. 


- Signal processing 
Other 80386 applications include laser pnnters, graphics, desktop publishing, process 
control, industrial control, test equipment, instrumentation, communications cquip- 
ment (such as network controllers or servers), office automation, banking terminals, 
guidance and control, and military systems. Let us briefly discuss why such applica- 


tions need the features and power of an 80386. 
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Personal Computers 


There are several markets for more capable personal computers such as the Compaq 
Deskpro 386 shown in Figure 1-5. One market is the so-called power users. They are 
people whose documents, spreadsheets, databases, or other applications require more 
memory and processing power than is currently available. In fact, they need more of 
everything, regardless of what is available. And they will always need more, regard- 
less of what advances occur. 

Power users include writers who want to check spelling and grammar, produce in- 
dexes and tables of contents, and process documents that are only slightly longer than 
the Encylopedia Britannica. Who reads the output? Probably no one, but it keeps 
editors, typists, typesetters, printers, and librarians busy. It also fills government 
repositories and satisfies contracts and regulations. 

Other power users are financial analysts whose models cover thousands of cells, 
compute complex formulas, and combine data from hundreds of sources. The result is 
an almost infinite number of factors that you can adjust to get the results you already 
know are correct. Still other power users have huge databases, large projects to manage, 
complicated graphics to create, or complex functions (such as product mixes or 
transportation costs) to optimize. 

Another market consists of scientists, engineers, and programmers who want to use 
personal computers as low-cost workstations. Solving huge sets of differential equa- 
tions, statistical analysis, simulation of complex phenomena, and compiling long 
programs written in high-level languages are typical tasks requiring large amounts of 
computing power. The needs are particularly great if the user wants interactive opera- 
tion or an intelligent interface, online help, macro or batch file capabilities, extensive 
graphics, and databases of common procedures and results. 

Such workstations are useful even when the overall problem requires a supercom- 
puter to solve. Using a workstation for program development, data entry, and analysis 
of results reduces the burden on the large machine. It also results in lower cost and 
greater convenience. 

Still another market consists of people who want low-cost multiuser capabilities. 
Figures 1-6 and 1-7 show 80386-based multiuser systems from Integrated Business 
Computers and Prime Computer, respectively. Owners of such systems may want one 
terminal for data entry and one for billing. Orthey may want several terminals for order 
entry, interviewing, information retrieval, or financial reporting. The key 1s that all 
users must share a common database, so several personal computers will not do the 
job. We might call this subde partmental computing, since it involves units smaller than 
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Figure 1-7 
The Prime EXL 316 supermicrocomputer. Photo courtesy of Prime 
Computer, Inc., Natick, Massachusetts. 


the departments handled by minicomputers. A doctor’s office, a pharmacy, a realtor’s 
office, or a restaurant would be typical applications. 

An 80386-based computer can also provide MS-DOS capabilities to users of other 
machines. For example, Logicraft’s 386WARE (Figure 1-8) allows DEC VAX users 
to run MS-DOS programs from their terminals. Here the 80386-based machine is both 
a file server and a working computer. 
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Figure 1-8 
The Logicraft 336WARE multiuser DOS server for DEC VAX net- 
works. Photo courtesy of Logicraft, Inc., Nashua, New Hampshire. 


CAD/CAM/CAE Systems 


CAD/CAM/CAE systems also require large amounts of processing power. A typical 
application is the Daisy Systems Personal Logician 386 shown in Figure 1-9. Although 
the name makes it sound like the perfect tool for Aristotle’s students, it is actually an 
80386-based engineering workstation used to design, simulate, and test integrated cir- 
cuits. Figure 1-10 shows another typical application. Here an architect is using an 
80386-based personal computer to design a building. CAD/CAM workstations like 
these must be able to: 
¢ Display complex objects. Many systems must do tasks such as rotation, 
shading, mirroring, scaling, and zoom. 
¢ Manipulate large libraries of standard parts, drawings, images, shapes, 
waveforms, and procedures. 
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Figure 1-10 

An IBM Personal System/2 Model 50 computer used to plot a sub- 
division. Photo courtesy of IBM Corporation, Information Systems 
Group, Rye Brook, New York. 


« Perform complex procedures (algorithms) such as routing, curve fitting, 
Optimization, worst-case analysis, filtering, and reliability or stability 
analysis. 

- Provide the ability to easily vary parameters for rapid “what-if” analysis. 
For example, suppose a planner was using a computer as shown in Figure 
1-10 to plot a subdivision. He or she might want to try several alterna- 
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tives and compare their costs, profits, environmental effects, and other 
features. 

- Simulate systems to estimate their performance and sce how they work 
under stress, transients, or failure modes. 

¢ Create test programs and patterns. 

¢ Allow users to easily manipulate complex objects, graphics, and systems. 

- Generate parts or wire lists, schematics, blueprints, bills of material, 
database entries, flowcharts, and other documentation. 

- Handle specialized design languages, databases, and other packages. 

« Provide familiar interfaces such as those of common test instruments, 
popular programs, or manual systems. 

Typical application areas include vehicles, electronic circuits, assembly lines, com- 
puter and communications networks, mechanisms, buildings and other structures, 
processing plants, chemistry, microbiology, and genetic engineering. CAD/CAM sys- 
tems can also aid surgeons, pharmacists, urban planners, geologists, power engineers, 
motion picture makers, aircraft designers, and civil engineers. Here again, extensive 
graphics (particularly three-dimensional color), user interaction, and checking of in- 
puts and results add to the computational load. 

An 80386-based CAD/CAM/CAE workstation (such as the ones shown in Figures 
1-1, 1-9, and 1-10) can provide extensive capabilities at low cost. Besides, it can also 
run popular word processing, spreadsheet, and database management software. The 
ability to run multiple operating systems is important here, since design tools often run 
under Unix, whereas personal computer programs run under MS-DOS. For example, 
Daisy Systems’ Personal Logician 386 (Figure 1-9) runs MS-DOS as a task in a 
modified Unix environment. Design tools can then use Unix’s 16-Mb address space 
instead of being restricted to MS-DOS’ 640K. 


Robotics 


Robotics also requires alarge amount of low-cost processing power. The more process- 
ing power a robot has, the more extensively it can analyze its surroundings and predict 
the consequences of its actions. If a robot is to recognize objects by vision or touch, 
move in a crowded environment containing obstacles, understand natural language 
commands, or do complex operations, it must have a substantial computer. It may also 
have to remember a large amount of data and many procedures, combine data from 
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many sources, and learn from what it does. The more processing power it has, further- 
more, the morc the robot can do on its own without constant supervision by an operator. 
Other things that a smarter robot can do include: 


Communicate with other robots and coordinate activities with them 
Perform self-test, self-checking, and sclf-diagnosis procedures 

Provide local facilities for modifying and developing programs 
Perform movements involving more degrees of freedom, closer 
tolerance, and higher resolution | 

Keep detailed records of the results of previous operations 

Recover from errors and system failures 

Use radar and sonar for navigation 


Artificial Intelligence 


Artificial intelligence (AJ) is still another application area requiring extensive process- 
ing power. The more processing power a computer has, the better it is at such AI tasks 


as. 


Evaluating the many alternatives involved in expert systems 
Implementing such techniques as inference, commonsense reasoning, 
backward chaining, and goal seeking 

Performing complex procedures involving backtracking, extensive 
decision logic, and many relationships 

Compiling programs written in AI languages such as LISP and Prolog, 
as well as in specialized high-level packages 

Handling uncertainty in data, procedures, and requests for information 


A more powerful computer can also access larger knowledge databases more quick- 
ly, provide more extensive tools, and consider a wider range of alternatives. It may 
even be able to respond to simple situations in real time. Typical application areas in- 
clude medical diagnosis, financial planning, vehicle repair, program interfaces, circuit 
and chip design, weather forecasting, automatic programming, genetic engineering, 
planning systems, and robotics. 
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Figure 1-11 

The Image Data Photophone, a device for sending still images on 
telephone lines. Photo courtesy of Image Data Corporation, San An- 
onio, Texas. 


Signal Processing 


Signal processing has long been an application requiring a lot of computing power. 
Typical uses include: 
¢ Enhancing images derived from satellites, remote cameras, or under-. 
water photography. Is that dark spot in the field a scarecrow ora missile? 
Military intelligence surely wants to know. 
« Interpreting data received from radar and sonar systems. For example, 
fighter pilots want to know quickly whether anobyject is an enemy aircraft 
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or a large bird. Similarly, ship captains want to quickly distinguish an 
Exocet missile from an oil tanker or a great white shark. 

- Filtering of noise from images and communications channels. 

- Compressing images for efficient storage and transmission. Both time 
and cost are important factors here. 

¢ Comparing waveforms for speaker identification, recognition of ir- 
regularities, removal of known or baseband sources, or isolation of chan- 
ges. 

- Analysis of visual data for use by robots. 

e¢ Speech recognition and synthesis. 

- Interpreting, storing, transmitting, and displaying medical images such 
as X-rays, CAT scans, and NMR scans. 

¢ Vibration analysis for determining the failure modes of structures, such 
as buildings, bridges, and nuclear reactors. 

- Music or sound synthesis and production. 

A typical image processing application is the Image Data Photophone shown in 
Figure 1-11. This system accepts input from a camera and sends the still images over 
a telephone line. It can also receive and print images. More processing power in a Sys- 
tem like this would allow it to hold more data, enhance images, provide graphical edit- 
ing, and transmit images faster. Image processing applications are of particular interest 
to geologists, the military, health care professionals, graphic artists, quality control 
Specialists, and communications analysts. 


_ KEY CONCEPTS 


We must understand the following new concepts to appreciate the 80386: 

¢ Virtual memory 

¢ Multitasking 

¢ Multiuser systems 

- High-level language and operating-system-oriented machines 

While these ideas are new to microprocessors and personal computers, mainframes, 

minicomputers, and specialized computers have used them for many years. Many of 
the 80386’s features are clearly derived from larger machines. 
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Virtual Memory 


Virtual memory refers to systems in which the combined size of the program and data 
areas may exceed the physical memory. The operating system keeps the currently used 
parts of the program and data in memory. It saves the rest on disk until it is needed. 
Before the computer accesses a location, a memory management unit (MMU) deter- 
mines whether it is actually in memory. If it is, the transfer proceeds as usual. All the 
MMU must do is convert the logical addresses to which programs refer into physical 
addresses that identify actual memory locations. The 80386 has its MMU on board. It 
consists of the segmentation and paging units shown in Figure 1-3. 

What happens if a location 1s not available? Then the MMU reports an error (called 
a page fault). The operating system must take control and load the required area from 
disk into memory. The virtual memory system thus acts like a clerk in a department 
store. He or she fills an order from the shelves if possible. If not, the next step is to re- 
quest the items from a stockroom or warehouse. The size and style you want, of course, 
is only available in the branch store in Outer Mongolia. 

Why would you want virtual memory? First, it lets us write programs that refer to 
large amounts of memory. The operating system can handle the details of moving areas 
between memory and disk. As far as the programmer is concerned, memory and disk 
are a Single continuous unit. This approach is much simpler than having a program do 
the transfers explicitly (a method called overlays). 

Virtual memory has other advantages as well. The same program can run on many 
computers, regardless of how much memory they have or how itis arranged. You need 
not revise a program to use larger memory units, less expensive memory, or a new 
computer model. Virtual memory also simplifies multiuser systems. Each user 
can refer to the entire memory. The memory management unit isolates users by keep- 
ing each one’s address space (program and data areas) separate. 

There are two major ways to implement virtual memory. One is by dividing program 
and data areas into logical units called segments. Different segments can then go into 
different areas of virtual memory. Segmentation is particularly convenient for operat- 
ing systems, since they generally must assign areas for each program and its data 
anyway. However, segmentation is a nuisance to programmers, since they must divide 
their code and worry about intersegment transfers and segment boundaries. The over- 
head also makes programs run more slowly. The 80386 uses segmentation but allows 
large enough segments so that most programs need not struggle with it. Note that the 
sizes and numbers of segments depend on individual programs, not on the underlying 
physical memory. 
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Table 1-3 
Page Table for the Example Memory System 
(Figure 1-12) 





Frame Number Page Number 


: 
13 
27 

6 
38 
51 
18 

5 
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The second approach is paging. Here, the operating system divides memory into 
areas of fixed size called pages. Disk operations transfer one page at a time. The 
memory management unit must only detennine whether a particular page is currently 
in memory. If not, the unit must fetch it from disk. Unlike segmentation, paging is in- 
visible to the programmer. Programs and data can extend over many pages with no 
problems. The 80386 allows paging but does not require its use. 

How does paging work in practice? Figure 1-12 shows a simple example of a paged 
virtual memory system. The addressable memory (from a program’s point of view) 
consists of 64 pages. However, the physical memory consists of only eight page-sized 
units or frames. Think of frames as slots into which we can put a page of virtual memory 
(or address space). Table 1-3 lists which pages currently occupy the frames. 

It is quite simple to access a page that is in memory. For example, suppose that a 
program’s next instruction is on page 2; fetching it 1s straightforward. Page 2 is in 
memory, and all the processor must do is find it (through a procedure called address 
translation). This involves looking up the page number in Table 1-3 (the page table) 
and reading the corresponding frame number (1 in the current case). 

Note that the pages are not ordered. The processor must use the page table to deter- 
mine whether a page is in memory and, if itis, which frame it 1s currently occupying. 

What happens if the next program instruction is on page 33? Now wehave aproblem, 
as page 33 is not in memory. A page fault therefore occurs. The operating system then 
takes control and must do the following: 
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1. Make room for page 33 in memory by removing a page (Say, 38) and saving it on 
disk. 

2. Load page 33 into memory in the frame (5) formerly occupied by page 38. 

3. Change the page table to indicate that frame 5 now contains page 33, not page 38. 

4. Retum control to the program that caused the fault. 


Obviously, recovery from a page fault is a complex process that may take a long time. 

The situation may be even more involved. A page fault may occur in the middle of 
an instruction. For example, an add instruction may require data from memory. If ob- 
taining the data causes a fault, the processor must suspend the add instruction while 
the operating system fetches the required page. The processor must then either start the 
instruction all over again (called restart) or resume it. Resuming an instruction requires 
extra storage and logic, since the processor may have to save many intermediate results. 

Note that one instruction could cause several page faults. For example, suppose it 
moves data from one memory location to another. The original instruction fetch could 
cause a fault, and each data transfer could cause another. The 80386 can restart all in- 
Structions from any point in their history at which a page fault could occur. However, 
it Cannot resume instructions. 

Although virtual memory has advantages, it also introduces new problems. Clear- 
ly, moving pages to and from memory (a process called swapping) takes time. A 
program will run slowly if it causes many page faults. We say that such a program 
(rather descriptively) is thrashing. The problem 1s clearly the same as maintaining shelf 
inventory in a store. You want to use your limited shelf space wisely to minimize the 
number of trips back to the stockroom. The usual solution, of course, is to keep the 
most popular sizes and styles immediately available. 

How do we use a limited memory space wisely? The common method is to only 
load a new page when required. We call this a demand paged system. Of course, to 
make room for a new page, the system must store an old one on disk. A common ap- 
proach is to remove the page that has gone the longest time without being used. We 
call this approach a least recently used (LRU) algorithm. To implement it, the operat- 
ing system must have a way of measuring page usage over time. 

In practice, most programs refer to a few pages repeatedly (called a working set). 
Once the system has brought a program’s working set into memory, it should execute 
without further swaps. For example, a program using the memory system in Figure 1- 
12 might have its main code on pages 2, 5, and 6 and its data on page 51. As long as it 
did not refer to other pages, no swaps would be necessary to execute it. 
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PHYSICAL | 
MEMORY DISK 
FRAME 1 PAGE 2 _s 
FRAME 2 PAGE 13 
FRAME3 |  PAGE27 
FRAME 4 PAGE 6 
FRAMES |  PAGE38 
FRAME 6 PAGE 51 
FRAME 7 PAGE 18 
FRAME 8 | PAGE 5 
PAGES 1-64 








Figure 1-12 
An example of a paged virtual memory system. 


Of course, the page size must be reasonable for this situation to be likely. If the pages 
are too large, the system cannot fit many of them into physical memory. Furthermore, 
transferring a page to or from disk may take a long time. If pages are too small, the sys- 
tem spends too much time dealing with them. Typical sizes for pages in actual com- 
puters are 1K to 4K bytes. The 80386 uses a 4-Kb page. Note that page size is fixed, 
whereas segment Size can vary. 

Paging is not always desirable. Applications such as guidance and control that re- 
quire real-time response often cannot tolerate the delays caused by page swapping. One 
hardly wants a missile or spacecraft to careen off course because its controller needs 
extra time occasionally to obtain a new page from disk. Even less critical applications 
may require that some pages always reside in physical memory. These could include 
a real-time clock, a high-priority interrupt, and the page fault handler itself. Obvious- 
ly, the computer is in big trouble if activating the page fault handler causes a page fault. 

Virtual memory always adds complexity to a computer system. In the first place, 
address translation takes extra time or hardware. The same holds for determining 
whether pages are available and marking usage for swapping algorithms. Furthermore, 
the operating system must set up the page tables and initialize registers and memory 
locations. The 80386 provides many features intended to reduce this complexity and 
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WHAT IS MULTITASKING? 
MULTITASKING 
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Figure 1-13 


Dividing processor time among many tasks. 


overhead. We have already noted that it can do address translation in parallel with other 
activities. 


Multitasking 


Another key concept in 80386-based systems is multitasking. This refers to running 
many tasks “at once,” usually by giving each one a Slice of CPU time and suspending 
those that must wait for input/output or other external events. Figure 1-13 gives acrude 
idea of how multitasking works fora system running five tasks. The system intertwines 
the execution of tasks, moving from one to another according to priority. In most cases, 
systems give very high priority to short jobs. Thus the computer can generally finish 
them quickly without greatly affecting the run times of long jobs. 
Each task is an independent entity. It has its own program and data areas, startup 
procedures, status, and priority. To run tasks efficiently, a computer must be able to: 
« Switch rapidly from one task to another. This generally involves saving 
and loading the entire machine state (registers and other facilities) as 
shown in Figure 1-14. 
¢ Keep tasks from interfering with one another, while still permitting effi- 
cient communications. 
¢« Resolve conflicts and prioritize operations. 
The 80386 provides special structures and instructions for holding task status, 
Switching tasks, and creating local and global environments. These hardware facilities 
greatly reduce the overhead for multitasking. 
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TRADITIONAL MULTITASKING 
TASK SWITCH PROCESS 


SAVE GENERAL PURPOSE REGISTERS 

SAVE STACK POINTER 

SAVE SYSTEM REGISTERS 

OPTIONALLY SAVE FLOATING POINT REGISTERS 
LOAD NEW TASK 

LOAD GENERAL PURPOSE REGISTERS 

LOAD SYSTEM REGISTERS 

SET UP LINK BETWEEN OLD AND NEW TASKS 
EXECUTE NEW TASK 
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80386 TASK SWITCH 
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SWITCHING FROM ONE TASK TO ANOTHER Task PRADITIONALLY IMPLEMENTED WITH SOFTWARE 


EXECUTING { 
e OPERATES IN < 17{LSECS 


LOOKS TRANSPARENT TO THE APPLICATIONS 


Figure 1-14 
A task switch. 


Most personal computer users first see the advantage of multitasking when waiting 
for a long printout. Without multitasking, printing occupies the computer completely. 
You cannot edit another file or do any other work until the printing is finished. The ef- 
fectis like waiting at the bank behind someone who has a mere 2,000 transactions to 
complete. Other tasks that may tie up acomputer for a long time include calculating a 
large spreadsheet, compiling a long program, sorting a database, checking spelling or 
grammar in along document, and transferring a large file via a modem. 

In multitasking, everyone gets a share of the available processing time (See Figure 
1-13). This does not affect tasks such as editing or data input. After all, they would be 
spending most of their time waiting for I/O anyway. Forexample, even the fastest typist 
canenter only about 30 characters per second. The multitasking operating system simp- 
ly does not run such tasks until their next input is available or their most recent output 
is finished. Meanwhile, other tasks can use time that would otherwise be wasted. The 
Situation is like having someone do other work while waiting for infrequent service 
calls. The legendary Maytag repairman can get a lot done this way. And it has little ef- 
fect on the response time. 

Multitasking also is useful in many situations not involving people. For example, a 
controller for a nuclear power plant may have many jobs to do. It must log the plant’s 
current status, respond to alarins, print reports, provide local displays, and perhaps com- 
municate with a central computer. Multitasking allows it to do all these things at once. 


UP TO 10X FASTER THAN SOFTWARE APPROA( 
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“At once” is figurative here. The processor can actually do only one task at a time. 
However, short time slots (like the ones shown in Figure 1-11) make the response ap- 
pear immediate to most users. 

Multitasking has a further advantage in many situations. It puts distinct functions in 
Separate, isolated units. The programmer can then readily change one without affect- 
ing the others. For example, you might want to change keyboards, add a faster printer 
Or one with color graphics, or replace a communications protocol or an optimization 
method. If the tasks are truly independent, you should only have to change one of them. 
This isolation would also apply if some tasks differed, depending on the computer 
model, operating mode, or optional attachments. 

The cost of multitasking is generally quite low. Of course, the operating system uses 
some processing time for task management. However, in practice, most tasks spend 
much of their time (perhaps 80 percent or more) waiting for I/O. Multitasking frees 
this time for productive use. Of course, systems get bogged down if they have too many 
tasks or if tasks are compute-bound rather than I/O-bound. Overall, multitasking makes 
all tasks take somewhat longer to execute, but it allows the computer to do much more 
useful work. 

Multitasking does requires some arbitration. Two tasks cannot both use the same 
I/O device ormemory area. One must wait for the other to relinquish control. But what 
happens if each has something the other needs and neither will yield? The operating 
System must resolve the conflict (called deadlock). Another problem is a task that ties 
up resources and then dies. Perhaps it tried to divide by zero or commit some other un- 
forgivable crime. The operating system must remove the carcass (a nice image!) and 
free the resources. The operating system may also have to intercept outputs and request 
inputs to avoid conflicts over I/O devices. 


Multiuser Systems 


The 80386 also simplifies the implementation of multiuser systems. Although single 
user systems have become more popular in recent years aS computer costs have 
decreased, multiuser systems are still necessary in some Situations. In particular, a mul- 
tiuser system allows many people to share a common database such as the records for 
a company, hospital, school, or govemment agency. Clearly, having everyone keep 
their own copies of common records would lead to needless confusion and repetition. 
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Multiuser systems are also essential for online transactions, such as reservation Sys- 
tems, order entry, and banking. Here the system’s main purpose is to provide rapid ac- 
cess to common records. 

Multiuser systems raise many new problems. Eachuser must get a slice of time much 
as in multitasking systems. In a sense, we canregard eachuser as a task. However, the 
key element is access to shared data. We need ways to provide this access without 
allowing the data to be corrupted, read while itis being updated, or accessed improperly. 


Protection 


Virtual memory, multitasking, and multiuser systems all require protection for key data 
and operating software. Users cannot be allowed to interfere with processes that affect 
other people. 

We may compare shared computer facilities with a carpool. Fortunately, sharing a 
computer works better in most cases, although the problems are similar. If you drive 
to work by yourself, how your car runs, when you come and go, and what route you 
take are all generally your own business. A flat tire or a dead battery inconveniences 
only yourself. Furthermore, you can make whatever side trips or extra stops you want. 

When you become part of a carpool, however, the situation changes. Now a flat tire 
inconveniences everyone. Riders do not generally appreciate side trips, extra stops, and 
late arrivals or departures. Everyone must give up some freedom for the benefit of the 
group. This is why you don’t see many successful carpools in practice. 

Shared facilities require an administrator who provides centralized control and 
recordkeeping. This applies to multiuser computers, networks, shared copying 
machines or printers, common areas in apartments or condomimum complexes, and 
public parks. 

The situation with operating systems is similar. In the single-user single-task en- 
vironment, you can do almost anything you want. Even overwriting the entire operat- 
ing system hurts no one but yourself. Furthermore, the operating procedures are usually 
simple enough so that you can readily get around the system’s limitations. With a lit- 
tle experience, of course. If things do not work out, you can always reset the machine 
or turn it off and start over. 

The multitasking or multiuser environment is different. One task or user cannot be 
allowed to disrupt others or spy on them. Resetting the computer or tuming it off is no 
longer a reasonable solution to a runaway program. It is, however, a great way to meet 
other users and raise the general level of local hostilities. 
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What we need is a way to protect programs and data. We must be able to protect 
Operating system software from users, and one user’s programs and data from other 
users. The 80386 provides mechanisms for doing both of these. It thus is well-suited 
to multiuser and multitasking systems. Furthermore, its hardware protection 
mechanisms take little or no extra time. 


Support for High-Level Languages 


Another feature of the 80386 is its support for high-level languages. This is surely es- 
sential when we are considering memory capacities in the gigabyte and terabyte range. 
Presumably, no one will write even a mere gigabyte of assembly or machine code. It 
would take almost forever to code, debug, and test. It would also require a battalion of 
programmers to maintain. For that matter, writing a gigabyte of code in a relatively 
simple high-level language such as BASIC or C would be a formidable job. 
How do we make high-level languages run more efficiently? Among the require- 
ments are: 
¢ Ready access to the stack for loading and saving operands. Compilers 
use the stack because it is ordered and easy to expand. 
¢« A wide variety of indexed and indirect addressing modes. Compilers 
need these to obtain data through variable pointers. 
¢ Ability to handle many data types. Compilers must be able to deal with 
bytes, words, double words, bits, bit fields, floating point numbers, and 
other formats. 
¢« Automatic verification of restrictions such as array bounds, stack limits, 
and reading and writing limitations. 
¢ Clear separation of program, data, and stack areas. 
- Aconsistent structure that simplifies code generation. 
The 80386 provides all these capabilities and features in a convenient package. 


COMPARISONS WITH PREVIOUS PROCESSORS 


The 80386 processor is fully compatible with the 8086, 8088, and 80286 processors. 
The major advances, as mentioned earlier, are: 

¢ 32-bit facilities and data paths throughout 

¢« Greatly expanded memory capacity 
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¢« Added hardware to allow more parallel operations through greater 
pipelining 

¢ Special hardware to speed up shifting, multiplication and division, and 
address calculations 

Overall, we may estimate the 80386’s performance at 16 MHz as follows: 

¢ Three times that of an 80286 running at 8 MHz 

- Ten times that of an 8086 or 8088 running at 6 MHz ~ 
This is arough guide only, and your actual mileage may vary. Seriously, the multiply- 
ing factor depends on the application, the instruction mix, and the speed of the 
computer’s other components (such as memory and I/O). For example, early 80386- 
based CAE workstations draw complex screens, connect components, and transfer 
schematics from disk to screen 2 to 4 times as fast as their 80286-based predecessors. 

As mentioned previously, the 80386 has 256 times the physical memory capacity 
of the 80286 and 4096 times that of the 8086 and 8088. Its virtual memory capacity is 
64K times that of the 80286 (the 8086 and 8088 do not support virtual memory). The 
result is an awesome 64-terabyte (TB) capacity. This would require a mere 130,000,000 
2-Mb boards! Or, since the 64 Tb is virtual, you could use 3 million 100-Mb disks. 
Despite what magazine advertisements may claim, you probably shouldn't sendin your 
check for the first terabyte PC right away. The 80386 also allows segments as large as 
4 Gb, as compared to the 64K maximum in earlier processors. 

Of course, many of these new features apply only to programs written specifically 
for an 80386. MS-DOS programs, for example, are limited to 64K segments and 640K 
of memory, even on an 80386-based computer. After all, they can only use features 
that are compatible with the 8088 and 8086 processors. Similarly, OS/2 programs are 
limited to 64K segments and 16 Mb of memory. They can only use features that are 
compatible with the 80286. The advantages of the 80386 in running old programs are 
its higher clock speed, more extensive pipelining, 32-bit data paths and facilities, and 
faster mathematical hardware. 


WHAT?’S NEXT IN MICROPROCESSORS? 


Obviously, the 80386 is not the last word from Intel or other vendors. There are many 
demands for large amounts of low-cost processing power that even it cannot satisfy. 
Fortunately, there are well-known techniques microprocessor designers can use to in- 
crease throughput. Among the ones you may see in future devices are: 
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- Largeron-chip instruction queues. These would allow evenmore pipelin- 
ing than is now possible. 

¢ On-chip memories (called caches) for extremely high-speed access. 
Program caches could hold an entire loop or sets of instructions from all 
recent branch addresses. Either approach would avoid the need to refill 
the pipeline after every jump. 

¢ QOn-board stack caches to hold the top locations of the stack. This would 
Speed up subroutines that keep most of their parameters and variable data 
on the stack. 

¢ Separate data and instruction memories. This would allow overlapping 
of data transfers with instruction fetches. 

¢ Greater optimization of instructions and instruction sets. 

¢« More registers to allow more on-chip storage. 

Of course, we will also see higher clock speeds, pipelines with more stages, and 
faster memory and I/O chips. 

A more general approach is to design a processor with fewer but faster-executing 
instructions. It would need less decoding circuitry and could have more registers and 
arithmetic circuits. We refer to such a processor as a reduced-instruction set (RISC) 
machine. 


SUMMARY 


The 80386 microprocessor represents a significant advance in low-cost computing 
power. It can run the large backlog of 8086 (MS-DOS) software much faster than can 
the 8088, 8086, and 80286 microprocessors. It also provides access to a huge amount 
of memory (4 Gb) but only in its native operating mode. 
The 80386’s native architecture simplifies the implementation of many key features 

of advanced computer systems, including: 

« Virtual memory 

¢ Multitasking 

¢« Multiuser systems 

¢ High-level languages and operating systems 

« Security, protection, and privacy 
Among its primary application areas are personal computers, CAD/CAM/CAE sys- 
tems, robotics, artificial intelligence, and signal processing. 
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In my experience, if you have to keep 
the lavatory door shut by extending your 
left leg, if's modern architecture. 
Nancy Banks-Smith, Guardian, 20 February 1979 


Ive finally learned what “upward compatible” means. 
if means we get fo keep all our old mistakes. 
Dennis Van Tassel 


This chapter describes the 80386’s basic architecture. It starts with the registers and 
then covers data types, addressing modes, and instructions. It deals with frequently 
used addressing modes and instructions first. The last sections discuss 8086 and 80286 
compatibility and introduce common assembler directives. 

The chapter presents features used by applications programmers as opposed to those 
used only by systems programmers. Of course, as students of the data processing cul- 
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ture know, “Real programmers don’t write applications programs. . .. Applications 
programs are for dullards who can’t do systems programming.” 





NOTATION _ 





This book describes 80386 facilities and programs using notation from Microsoft’s 
Macro Assembler. The key elements after a number are: 
B Binary 
D (or no designation) Decimal 
H Hexadecimal. Hexadecimal numbers that begin with a letter digit 
(A through F) require an initial zero to distinguish them from symbolic 
names. Forexample, we would write FF as OFFH. 
me default case (that is, unmarked) is decimal. Other symbols are: 
After a label associated with an instruction statement or between segment 
register designations or segment numbers and offsets 
Before a comment 
Around characters (before and after a string) 
{] Around amemory address 
Names, numbers, and expressions not enclosed in brackets are taken to be data values. 


REGISTERS 


Figures 2-1 and 2-2 show the 80386’s basic architecture. Figure 2-2 omits the segment 
registers that simply position program, data, and stack areas in memory. The general- 
purpose or uSer registers (see Figure 2-2) are: 
¢ EAX, the primary accumulator. It is also the holding place for most data 
being moved into and out of the processor. 
¢- EBX, the base (address) register. It often holds addresses for indexing 
and indirection. 
- ECX, the count register. Its main purpose is to hold the number of itera- 
tions or shifts. 
- EDX, the data register. It serves as an extension of the accumulator and 
holds post addresses for input and output. 
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GENERAL DATA AND ADDRESS REGISTERS 
31 TG; 5 0 


EAX Avec ne | ale es 

EBX peng se 

ECX eb ines. 

EDX Dale Reta tin 

ESI Somnek (ans ee 

EDI Pes me x 
EBP Buy peutw ge 
ESP Stae K pombe 








CODE 
STACK 
DATA 
FS 
GS 


INSTRUCTION POINTER 
AND FLAGS REGISTER 


31 16 15 0 


es 
| FLAGS | EFLAGS 


Figure 2-1 
80386 base architecture registers. 


¢ ESI and EDI, the source and destination index registers. They often hold 
pointers for array and string manipulation. The names come from their 
roles in string instructions. 

¢ EBP, the base pointer. Its main use is 1n accessing data on the stack. 

« ESP, the stack pointer (nothing out of this world, unfortunately). Its main 
use 1S in moving data and addresses to and from the stack. 
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31 16 15 8 
AH AIX 
BH BIX 
CH CX 
DH DIX 
SI 


- 
BP 


SP 


Figure 2-2 
80386 general registers and instruction pointer. 


« EJP, the instruction pointer or program counter. It locates the next in- 
struction the processor will fetch from memory. Its close relative, EIO, 
is well-known for its role on Old MacDonald’s Fart (sorry, but I couldn't 


resist this!). 


The initial E in the names indicates an “extended” or 32-bit register. As Figure 2-1 
shows, the low words (bits O through 15) of these registers have the same names without 
the E. The 16-bit registers, which can be accessed separately, are carryovers from ear- 
lier 16-bit processors (8088, 8086, and 80286). Note that all general-purpose registers 


have 16-bit subunits. 


As Figure 2-2 shows, parts of some registers are byte addressable. As this feature 1s 
also derived from earlier processors, it applies only to the low words. The byte-length 


registers are: 


AL 


BL 
CL 
DL 


EAX 
EBX 
ECX 
EDX 
ESI 

EDI 
EBP 
ESP 


ele 


¢ AH and AL (bits 8 through 15 and O through 7 of EAX) 
¢ BH and BL (bits 8 through 15 and O through 7 of EBX) 
¢ CH and CL (bits 8 through 15 and 0 through 7 of ECX) 


¢ DH and DL (bits 8 through 15 and O through 7 of EDX) 
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FLAGS 
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VIRTUAL MODE , CARRY FLAG 
RESUME FLAG PARITY FLAG 
NESTED TASK FLAG — | AUXILIARY CARRY 
(/O PRIVILEGE LEVEL ZERO FLAG 
OV ERE LO WW meme SIGN FLAG 


DIRECTION FLAG : = — TRAP FLAG 
INTERRUPT ENABLE ae 


Figure 2-3 
80386 flags register. 


SI, DI, BP, SP, and IP are not byte addressable. This inconsistency is the result of three 
generations of upward-compatible designs. If a camel is a “horse designed by a com- 
mittee,” just imagine how a new, totally compatible version would look. 

In general, the 80386 can address any 8-, 16-, or 32-bit register. Although we have 
mentioned primary uses of the general-purpose registers, the 80386 has few actual 
restrictions. There are some limitations on the use of 16-bit registers in accordance with 
the 8086 and 80286 architectures. In particular, only BP and BX are available as base 
registers and only DI and SI as index registers. Even in the 32-bit mode, however, op- 
posing historical tradition may cost time and memory. Conservatism has a payoff here. 

The instruction pointer (EIP) or program counter (PC) differs from other user 
registers. This is why it appears in a separate part of Figure 2-1. The difference is that 
the processor controls EIP most of the time. In particular, the processor increases it 
automatically after fetching an instruction from memory. Like most processors, the 
80386 thus executes instructions sequentially unless specifically told to do otherwise. 
The programmer can control EIP through instructions (calls, jumps, returns, and 
software interrupts or traps) that change its value explicitly. 

The bottom part of Figure 2-1 also shows a status register called EFLAGS. The low 
word of this register, called FLAGS, is compatible with earlier processors. Figure 2-3 
shows EFLAGS in detail. For now, we will be concerned only with the following bits: 

¢ OF (overflow) indicates whether the latest arithmetic operation or shift 
produced an (arithmetic) overflow. 





The common flags are (from right to left): Carry, Zero, and Sign (CF, ZF, and SF). 
Programs often use their values to choose between alternative paths. Carry also trans- 
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DF (direction) indicates whether string instructions will increase (OQ) or 
decrease (1) their pointers. 

IF (interrupt enable) indicates whether the processor will recognize in- 
terrupts. A 1 value indicates that it will, a O that it will not. IF is an enable, 
not a mask. 

TF (trap) indicates whether the processor is operating in a single-step 
mode. A 1 value indicates that it is, a O that it is not. | 

SF (sign) indicates whether the result of the latest arithmetic or logical 
operation had a 1 1n its most significant bit. The most significant bit is 
bit 7 for 8-bit operations, bit 15 for 16-bit operations, and bit 31 for 32- 
bit operations. 

ZF (zero) indicates whether the result of the latest arithmetic or logical 
operation was O. Note that ZF is 1 if the result was zero, and O if it was 
not. Although this flag is standard in processor architecture, it remains a 
source of confusion. 

AF (auxiliary carry) indicates whether the latest arithmetic operation 
produced a carry from bit 3. AF’s main use is 1n decimal arithmetic. 

PF (even parity) indicates whether the result of the latest arithmetic or 
logical operation had even parity. PF is 1 if it did and Oif it did not. Even 
parity means that the number of 1 bits is even. PF reflects only the low 
byte (bits O through 7) of the result, regardless of how many bits the 
operation involves. 

CF (carry) indicates whether the latest arithmetic, logical, or shift instruc- 
tion produced a carry. Logical instructions such as AND and OR cannot 
produce a carry, so they always clear this flag. Bit manipulation instruc- 
tions use CF to store the tested bit’s value. 


fers a bit between operations in multiple-precision arithmetic. 


The middle part of Figure 2-1 shows the segment registers. They select subdivisions 
of memory called segments. Note that they are 16-bit registers, even though the 80386 


is a 32-bit processor. The segment registers are: 


CS, the code segment register, selects the segment from which the proces- 
sor will obtain instructions. 

SS, the stack segment register, selects the segment occupied by the stack. 
DS, ES, FS, and GS, the data segment registers, select the current seg- 
ments used for data. We call DS the data segment register and ES the 
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24)23 16/15 0 
= 
\ a ) 
MSW 
NOTE: bie 0 indicates Intel reserved: Do not define: 
Figure 2-4 
Control register O. 
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NOTE: indicates Intel reserved: Do not define 


Figure 2-5 
Control registers 2 and 3. 


SYSTEM ADDRESS REGISTERS 
47 32-BIT LINEAR BASE ADDRESS 1615 LIMIT _0 





SYSTEM SEGMENT 
REGISTERS DESCRIPTOR REGISTERS (AUTOMATICALLY LOADED) 


aii 





_ g2-BIT LINEAR BASE ADDRESS —_—32-BITSEGMENT LIMIT ATTRIBUTES) 





Figure 2-6 
System address and system segment registers. 
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DEBUG REGISTERS 
«al 0 


LINEAR BREAKPOINT ADDRESS 0 | DRO 
LINEAR BREAKPOINT ADDRESS1| ORI 
| LINEAR BREAKPOINT ADDRESS 2 | DR2 









| LINEAR BREAKPOINT ADDRESS 3 | DR3 
| Intgl reserved. Do not define. | DR4 
| Intel reserved. Do not define. DRS 


BREAKPOINT STATUS _ DR6 
| BREAKPOINT CONTROL | OR? 


TEST REGISTERS (FOR PAGE CACHE) 


eo a ae. 0 
TEST CONTROL | soTR6 
TESTSTATUS _|  TR7 





Figure 2-7 
Debug and test registers. 


extra (data) segment register. They were the only data segment registers 
in earlier processors. The names FS and GS, the additions to the 80386, 
have no significance other than following ES alphabetically. Fortunate- 
ly, Intel did not go backward and give us AS and BS. 

The 80386 also has specialized registers. Only operating systems generally use them 
and then only for initialization, status management, and other infrequent operations. 
Figures 2-4 through 2-7 show the specialized registers. We will describe them in detail 
in later chapters. For now, let us merely list them as: 

¢ Control registers O, 2, and 3 (Figures 2-4 and 2-5). What happened to 
control register 1? It exists, but Intel has reserved its use for the present. | 

- System address and segment registers (Figure 2-6). These are GDTR 
(global descriptor table register), IDTR (interrupt descriptor table 
register), TR (task register), and LDTR (local descriptor table register). 

- Debug and test registers (Figure 2-7). 
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Figure 2-8 
80386 supported data types. 
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DATA TYPES 





Figure 2-8 summarizes the data types handled by the 80386. The primary ones are 8- 
bit, 16-bit, and 32-bit integers. Note the following definitions: 


A byte is an 8-bit unit. 

A word is a 16-bit unit. 

A double word (or dword, but don’t ask me how to pronounce it) is a 32- 
bit unit. 

A quad word (or qword) is a 64-bit unit. 

A BCD (binary-coded-decimal) representation is one that encodes each 
decimal digit separately. 

A string 1s any contiguous sequence of units. 


Note that we call 16 bits a word and 32 bits a double word, even though the 80386 
is a 32-bit processor. This usage is consistent with definitions for 16-bit processors. 
However, it 1s contrary to standard definitions and to terminology in many other 
computers. 

The supported data types are: 


Single bits. 

Bit strings, sets of contiguous bits that may be up to 4 gigabits long. A 
gigabit (not to be confused with a gigabyte) is 2°” bits. 

Signed or unsigned bytes. Bit 7 is the sign bit for a signed byte. 

Signed or unsigned words (integers). Bit 15 is the sign bit. 

Signed or unsigned double words (long integers). Bit 31 is the sign bit. 
Signed or unsigned quad words. Bit 63 is the sign bit. 

Offsets, 16- or 32-bit quantities that contain the distance from a base ad- 
dress to the referenced address. 

Pointers, consisting of a 16-bit segment selector and a 16-bit or 32-bit 
offset. 

Characters, 8-bit units usually containing representations in ASCII 
(American Standard Code for Information Interchange). 

Strings, contiguous sequences of units containing up to 4 Gb. The units 
may be bytes, words, or double words. 

Packed and unpacked BCD. Packed BCD has two digits per byte, un- 
packed BCD one digit per byte. 

Floating point. An 80287 or 80387 numerical coprocessor (see Chapter 
8 and Appendixes G, H, and I) handles 32-bit, 64-bit, and 80-bit repre- 
sentations of real numbers. The formats come from the IEEE 754 
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Figure 2-9 
80386 data type storage. 


standard (EEE Standard for Binary Floating-Point Arithmetic, IEEE, 
New York, 1985). 
As mentioned earlier, the primary data types are bytes, words, and double words. 
Most instructions apply to them as shown in Table 2-1. Only a few instructions operate 
on the other types. 


80386 Programming Guide 


Table 1 


Principal Data Types and Instructions 
a ee el 


Type Size Instructions 





Integer, 8, 16, 32 Move, Exchange, Transiate, 

Ordinal bits Test, Compare. Convert. 
Shift. Double Shift, 
Rotate, Not, Negate, And, 
Or, Exclusive Or, Add, 
subtract, Multiply. 
Divide. Increment. 
Decrement, Convert 
(Move with sign/ zero 


, extension) | 
Unpacked I digit Adjust for: Add. Subtract. 
Decimal Multiply. Divide 
Packed 2 digits Adjust for: Add, Subtract 
Decimal 
String (byte. O4G bytes. Move. load. Store. 
word. dword) words.dwords Compare, Scan. Repeat — 
Bit String 1-AG bits Test, Test and Set. Test 
and Reset. Test and 
Complement. Scan. Insert, 
| Extract 
Near 32 bits (Same as Ordinal) 
Pointer? | 
Far Pointer 48 bits Load — 


|| A near pointer is a 32-bit offset into a segment defined 
by one of the segment,;descriptor register pairs. A far 
pointer is a full logical addess, that is. a selector and 
an offset. 


Figure 2-9 shows how data types appear in memory. In general, the low byte is al- 
ways at the lowest (starting or base) address. Bytes of increasing significance are at 
successively higher addresses. Although this may seem backward at first, it does allow 
arithmetic, logical, array, and string operations to begin with the low byte at the lowest 
address. 

Here are examples of 80386 data storage: 


1. Suppose we have a word stored at address COO00 hex. Its low byte is in address 
C0000, and its high byte is in address C0001. If a dump utility displays these ad- 
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dresses consecutively, the low byte will be at the left. This is, of course, opposite to 
the standard arrangement. For example, if COOOO contains 37 and COOO1 contains 
Al hex, the byte-oriented display will show 37A1. However, the 80386 will read 
the word as A137. If this makes sense to you, you probably should be a tax 
accountant. 
2. Suppose we have a quad word (64 bits) stored at address 9D718. It will appear in 

memory as follows: 

e Bits O through 7 in address 9D718 

- Bits 8 through 15 in address 9D719 

- Bits 16 through 23 in address 9D71A 

¢ Bits 24 through 31 in address 9D71B 

¢ Bits 32 through 39 in address 9D71C 

¢ Bits 40 through 47 in address 9D71D 

¢ Bits 48 through 55 in address 9D71E 

¢ Bits 56 through 63 in address 9D71F 


Note that the sign bit is in address 9D71F. Reading a quad word from a dump is easy 
if your native language is Hebrew. 


ADDRESSING MODES — 


The 80386 has many addressing modes that provide different ways to access operands. 
The common ones are: 
¢ Register. The operand is in a register. 
¢« Immediate. The operand is part of the instruction, usually immediately 
following the operation code. 
¢ Direct. The operand’s address is part of the instruction. 
¢ Register indirect. The operand’s address is in a base or index register. 
¢ Index. The operand’s address is the sum of an index register and a dis- 
placement. The displacement is part of the instruction. 
¢ Based index with displacement. The operand’'s address is the sum of an 
index register, a base register, and a displacement. 
Note the following definitions: 
A displacementis an 8-bit, 16-bit, or 32-bit immediate value following an operation 
code. This is a fixed part of program memory. 
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A base is the contents of any general-purpose register. The registers most often used 


for this purpose are EBX and EBP. 


An index is the contents of any general-purpose register except ESP. The registers 


most often used for this purpose are ESI and EDI. 


peel 


Let us now look at examples of the common addressing modes: 


. MOV EBX,EAX. This instruction uses register addressing to move the contents of 


register EAX to register EBX. We will explain the seemingly backward order of the 
operands shortly. 


. ADD AL,5. This instruction uses immediate addressing to add the number 5 to 


register AL. Note that an unmarked number is treated as data, not as an address. The 
same holds for an unmarked name or expression in standard assembler notation. 


. OR AL,[2000H]. This instruction uses direct addressing to logically OR the con- 


tents of memory location 2000 (hex) with register AL. Note that brackets around a 
number or expression indicate that it is an address, not data. 


. MOV ECX,[EBX]. This instruction uses register indirect addressing to move data 


from the address in EBX to register ECX. If, for example, EBX contains 1000 (hex), 
ECX ends up with the contents of memory locations 1000 (low byte) through 1003 
(high byte). 


. MOV AL,|IOOH[ESI]. This instruction uses the based index mode to load AL from 


the address obtained by adding 100 (hex) to the contents of ESI. If, for example, 
fESI] = 1700 hex, the effective address is 1700 + 100 = 1800 hex. An altemative 
(perhaps clearer) notation is MOV AL,[ESI+100H]. 


. MOV EAX,S5[EBX+ESI].This instruction uses the based index mode with a dis- 


placement. The effective address is the sum of EBX, ESI, and 5 (the displacement). 
If, forexample, [EBX] = 1 DOO (hex) and [ESI] = 0200 (hex), the effective address 
is 1|DOO + 0200 + 5 = 1F05 (hex). The new contents of EAX come from that address 
and the next three higher addresses (that is, 1FO5 through 1F08). Alternative nota- 
tions are MOV EAX,[_EBX+ESI+5] and MOV EAX,S[EB X][ESI]. 


Less common addressing modes use a scaled index. Here the processor multiplies 


the index by a scaling factor (2, 4, or 8) and uses the product to compute the effective 
address. This mode is useful for accessing arrays with multibyte entries. For example, 
Suppose we have an array of 32-bit addresses. If its base address is in EBX and the ele- 
ment number is in ES], the following instruction will load an element into EAX: 


MOV EAX,[EBX+ESI*4] 
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Table 2-2 

80386 Addressing Modes 

Addressing Mode Form Effective Address 

Register r1,r2 Register 

Immediate data Address immediately 
after operation code 
contains data 

Direct addr Addr 

Register Indirect [base] Contents of base register 

Based disp[base] Contents of base register 
+ disp 

Index disp[index] Contents of index 
register + disp 

Scaled Index disp[index*scale] Scale factor x 
Contents of index 
register + disp 

Based Index [base+index] Contents of base register 
+ Contents of index 
register 

Based Scaled Index [base+index*scale] Scale factor x 
Contents of index 
register + Contents of 

| base register 
Based Index with disp[base+index] Contents of base 
Displacement register + Contents of 

index register + disp 

Based Scaled Index disp[base+index Contents of base 

with Displacement *scale] register + Scale factor 


x Contents of index 
register + disp 
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Scaled indexing is necessary because each element occupies 4 bytes of memory. In 
previous processors, amultiplication or shiftwas necessaryto convert theelementnum- 
ber into an actual offset from the base address. 

Table 2-2 summarizes the 80386’s addressing modes. 


LIN i r = ? 


ne result of the 80386’s increased pipelining is that it can calculate effective ad- 


dresses while doing other operations. Most addressing modes therefore require no extra 
clock cycles. The only exceptions are the based index modes with displacement. They 
take one extra clock cycle. 









INSTRUCTION SET 


Table 2-3 contains a complete list of the 80386’s instructions. Before describing the 
set in detail, let us first consider the most frequently used instructions. A rule of thumb 
is that 20 percent of an instruction set (or a language’s set of statements) makes up 80 
percent of most programs. In assembly language, the rule is probably even stricter. 
Generally speaking, 10 percent of most processors’ instruction sets makes up 90 per- 
cent of assembly language programs. This is not to say that the other instructions are 
unnecessary but just that they are uncommon. This observation is one reason behind 
the development of RISC (reduced-instruction-set) computers. 

Table 2-4 lists the 80386’s frequently used instructions. We may characterize them 
further as follows: 

¢ Data Transfer 


IN input 

LEA load effective address 
MOV move 

OUT output 


POP _ load from stack 
PUSH _ store on stack 

¢ Arithmetic/Logical 
ADC _ add with carry 
ADD _ add 
AND _ logical AND 
CMP compare 


AI 
OO 
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Table 2-3 
Complete 80386 Instruction Set 





Instruction Meaning Assembler Format 
AAA ASCII Adjust after Add AAA 
AAD ASCII Adjust before Divide AAD 
AAM ASCII Adjust after Multiply AAM 
AAS ASCII Adjust after Subtract AAS 
ADC Add with Carry ADC dest,src 
ADD Add ADD dest,src 
AND Logical AND AND dest,src 
ARPL Adjust RPL Field of Selector ARPL sel,reg 
BOUND Check Array Bounds BOUND reg,bound 
BSF Bit Scan Forward BSF dest,src 
BSR Bit Scan Reverse BSR dest,src 
BT Bit Test BT base,offset 
BTC Bit Test and Complement BTC base, offset 
BTR Bit Test and Reset BTR base,offset 
BTS Bit Test and Set BTS base,offset 
CALL Call Procedure CALL _ dest 
CBW Convert Byte to Word CBW 
CDQ Convert Double Word to Quad Word CDQ 
CLC Clear Carry Flag CLC 
CLD Clear Direction Flag CLD 
CLI Clear Interrupt Flag CLI 
CLTS Clear Task-Switched Flag CLTS 
CMC Complement Carry Flag CMC 
CMP Compare CMP dest,src 
CMPS Compare Strings CMPS dest,src 
CWD Convert Word to Double Word* CWD 
CWDE Convert Word to Double Word* CWDE 
DAA Decimal Adjust after Add DAA 
DAS Decimal Adjust after Subtract DAS 
DEC Decrement by 1 DEC dest 
DIV Unsigned Divide DIV acc,src 
ENTER Make Stack Frame for Procedure ENTER _ storage,level 
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Table 2-3 (continued) 
Complete 80386 Instruction Set 





Instruction Meaning Assembler Format 
ESC Escape ESC 
HLT Halt HLT 
IDIV Signed Divide IDIV acc,src 
IMUL Signed Multiply IMUL acc,src 
IN Input from Port IN acc,port 
INC Increment by 1 INC dest 
INT Software Interrupt (Trap) INT inttype 
INTO Interrupt If Overflow INTO 
IRET Interrupt Return IRET 
JA Jump If Above JA dest 
JAE Jump If Above or Equal JAE dest 
JB Jump If Below JB dest 
JBE Jump If Below or Equal JBE dest 
JC Jump If Carry IC dest 
JCXZ Jump If CX Is Zero JCXZ dest 
JE Jump If Equal JE dest 
JECKZ Jump If ECX Is Zero JECXZ dest 
JG Jump If Greater JG dest 
JGE Jump If Greater or Equal JGE dest 
JL Jump If Less JL dest 
JLE Jump If Less or Equal JLE dest 
JMP Jump Unconditionally JMP dest 
INA Jump If Not Above JNA dest 
JNAE Jump If Not Above or Equal JNAE dest 
JNB Jump If Not Below JNB dest 
JNBE Jump If Not Below or Equal JNBE dest 
INC Jump If No Carry INC dest 
JNE Jump If Not Equal JNE dest 
JNG Jump If Not Greater JNG dest 
JNGE Jump If Not Greater or Equal JNGE dest 
JNL Jump If Not Less INL dest 
JNLE Jump If Not Less or Equal JNLE dest 
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Table 2-3 (continued) 
Complete 80386 Instruction Set 





51 


Instruction Meaning Assembler Format 
JNO Jump If No Overflow JNO dest 
JNP Jump If Parity Odd JNP dest 
INS Jump If Sign Positive INS dest 
INZ Jump If Not Zero INZ dest 
JO Jump If Overflow JO dest 
JP Jump If Parity Even JP dest 
JPE Jump If Parity Even JPE dest 
JPO Jump If Parity Odd JPO dest 
JS Jump If Sign Negative JS dest 
JZ Jump If Zero JZ dest 
LAHF Load Flags into AH Register LAHF 
LAR Load Access Rights Byte LAR reg,src 
LDS Load DS Register LDS reg,src 
LEA Load Effective Address LEA reg,src 
LEAVE Leave Procedure LEAVE 
LES Load ES Register LES reg,src 
LFS Load FS Register LFS reg,src 
LGS Load GS Register LGS reg,src 
LGDT Load GDT Register LGDT _- src 
LIDT Load IDT Register LIDT STC 
LLDT Load LDT Register LLDT STC 
LMSW Load Machine Status Word LMSW src 
LOCK Lock Bus LOCK 
LODS Load String LODS STC 
LOOP Loop with CX Counter LOOP _ dest 
LOOPE Loop If Equal LOOPE dest 
LOOPNE Loop If Not Equal LOOPNE dest 
LOOPNZ Loop If Not Zero LOOPNZ dest 
LOOPZ Loop If Zero LOOPZ = dest 
LSL Load Segment Limit LSL reg,src 
LSS Load SS Register LSS reg,src 
LTR Load Task Register LTR STC 


Table 2-3 (continued) 
Complete 80386 Instruction Set | 





Instruction Meaning Assembler Format 
MOV Move Data MOV dest,src 
MOV Move to/from Special Regs MOV dest,src 
MOVS Move String MOVS _ dest,src 
MOVSX Move with Sign-Extend MOVSX_ reg,src 
MOVZX Move with Zero-Extend MOVZX_ reg,src 
MUL Unsigned Multiply MUL acc,src 
NEG 2’s Complement Negation NEG dest 
NOP No Operation NOP 
NOT 1’s Complement Negation NOT dest 
OR Logical Inclusive OR OR dest,src 
OUT Output to Port OUT port,acc 
OUTS Output String OUTS DxX,src 
POP Pop Operand off Stack POP dest 
POPA Pop All General Registers POPA 
POPF Pop Flags off Stack POPF 
PUSH Push Operand Onto Stack PUSH STC 
PUSHA Push All General Registers PUSHA 
PUSHF Push Flags onto Stack PUSHF 
RCL Rotate Left through Carry RCL dest,count 
RCR Rotate Right through Carry RCR dest,count 
REP Repeat REP 
REPE Repeat while Equal REPE 
REPNE Repeat while Not Equal REPNE 
REPNZ Repeat while Not Zero REPNZ 
REPZ Repeat while Zero REPZ 
RET Retum from Procedure RET 
ROL Rotate Left ROL dest,count 
ROR Rotate Right ROR dest,count 
SAHF Store AH Register in Flags SAHF 
SAL Shift Arithmetic Left SAL dest,count 
SAR Shift Arithmetic Right SAR dest,count 
SBB Subtract with Borrow SBB dest,src 
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Table 2-3 (continued) 
Complete 80386 Instruction Set 





Instruction Meaning Assembler Format 
SCAS Compare String SCAS dest 
SET cc Set Byte on Condition SETcc dest 
SGDT Store GDT Register SGDT _ dest 
SHL Shift Logical Left SHL dest,count 
SHLD Double Precision Shift Left SHLD dest,src,count 
SHR Shift Logical Right SHR dest,count 
SHRD Double Precision Shift Right SHRD _ dest,src,count 
SIDT Store IDT Register SIDT dest 
SLDT Store LDT Register SLDT dest 
SMSW Store Machine Status Word SMSW _ dest 
STC Set Carry Flag STC 
STD Set Direction Flag STD 
STI Set Interrupt Flag STI 
STOS Store String STOS dest 
STR Store Task Register STR dest 
SUB Subtract SUB dest,src 
TEST Logical Compare TEST dest,src 
VERR Verify Segment for Reading VERR _ sel 
VERW Verify Segment for Writing VERW sel 
WAIT Wait until BUSY# Negated WAIT 
XCHG Exchange Operand, Register XCHG _ dest,src 
XLAT Table Lookup MXLAT _ source-table 
XOR Logical Exclusive OR XOR dest,src 


* CWD sign extends register AX into registers DX and AX, whereas CWDE sign 
extends AX into EAX. 
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DEC _ subtract 1 

INC  add1l 

NOT logical NOT (complement or invert) 
ROL _ rotate left 

ROR _ rotate right 

SBB _ subtract with borrow 

SHL _ shift logical left 

SHR _ shift logical right 

SUB _ subtract 

TEST bit test 


¢ Program Control 


CALL call subroutine 


INT interrupt (trap) 

JA jump if above 

JAE  jumpif above or equal 
JB jump if below 

JBE jump if below or equal 
JIC jump if carry 

JE jump if equal 

JMP jumpunconditionally 
JNC jump if not carry 

JNE jump if not equal 

INS jump if not sign 

JINZ jump if not zero 

JS jump if sign 

JZ jump if zero 

RET _ return from subroutine 


Frequently Used Data Transfer Instructions 


The frequently used data transfer instructions are IN, LEA, MOV, OUT, PUSH, and 
POP. Let us now describe them in more detail. 
MOV moves data from one address to another. It is really a“copy” instruction, since 
the source does not change. The general form is 
MOV destination,source 
Note that the destination comes first. For example, 
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Table 2-4 
Frequently Used 80386 Instructions 





Instruction Meaning 
ADC Add with carry 
ADD Add 
AND Logical AND 
CALL Call subroutine 
CMP Conipare 
DEC Subtract 1 
IN Input 
INC Add 1 
INT Interrupt (trap) 
JA Jump if above 
JAE Jump if above or equal 
JB Jump if below 
JBE Jump if below or equal 
IC Jump if carry 
JE Jump if equal 
JMP Jump unconditionally 
INC Jump if not carry 
JNE Jump if not equal 
JNS Jump if sign positive 
INZ Jump if not zero 
JS Jump if sign negative 
LEA Load effective address 
MOV Move 
NOT Logical NOT 
(complement or invert) 
OUT Output 
POP Load from stack 
PUSH Store on stack 
RET Retum from subroutine 
ROL Rotate left 
ROR Rotate right 
SBB Subtract with borrow 
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MOV BL,CL 
moves the contents of register CL to BL. Register CL does not change. Be careful — 
the direction is the opposite of what you might expect. However, it is exactly the same 
as the direction in assignment statements (such as C = B) in high-level languages 
(BASIC, C, or Pascal). Reversing the source and destination in moves is a common 
error in 80386 assembly language programs. 

Either operand in MOV can use any 80386 addressing mode. The only limitation is 
that one operand must be either a register or an immediate data value. Thus MOV can 
do the following transfers: 

¢ Register to register 

¢ Memory to register 

¢ Register to memory 

¢ Immediate data to register or memory 

MOV can put immediate data into memory without using any registers. It cannot, 
however, transfer variable data from one memory address to another. Some examples 
are: 


jemi 


. MOV AL,BL. This instruction moves a data byte from register BL to register AL. 
If a register is a move’s source or destination, its length determines the size of the 
data (byte, word, or double word). 

. MOV AL,8[EBX]. This instruction loads a byte of data from an effective address 
into register AL. The effective address is the contents of EBX plus 8. Here again, 
the destination register (AL) determines the data’s size. 

3. MOV BYTE PTR {[EBX+ESI],OCH. This instruction moves the valuc OCH to the 

effective address given by the sum of registers EBX and ESI. As neither operand is 
a register, we need a way to indicate the data’s size. The alternatives are: 
ta 
BwORD PH rer iedne 

Note that you must put one of these in the instruction. Most assemblers do not 

provide a default. 


NO 





IN and OUT move data from and to peripherals, respectively. Peripherals have their 
own 64K address space, separate from memory addresses. We call this approach tso- 
lated input/output. I/O addresscs are 16 bits long and are nonscgmented. That is, seg- 
ment registers do not apply to I/O addresses, and hence. no address translation is 
required. 
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IN and OUT have two primary forms: 








a to 
Atel has 





reserved ports F8 t h FF hex for use with coprocess 
a, | ~ - TT - tan : Ss i 


dress ough.register DX. This approach allows 2 


ta ~ 
._J AD Ss 


Note that IN and OUT always use an accumulator (AL, AX, or EAX) and either a 
fixed port address or register DX. No other combinations are allowed. Note also that 
there are no brackets around DX, even though it is an indirect address. 


Examples 


1. The following instruction moves a byte from port 50 (hex) to accumulator AL: 
IN AL,50H 
2. The following instruction sequence moves a double word from accumulator EAX 
to port number FFOO (hex): 
MOV DX,OFFOOH 
OUT DX,EAX 
3. The following instruction sequence loads EAX with a double word from port num- 
ber C180 (hex): 
MOV = DX,0C180H 
IN EAX,DX 


PUSH and POP move data to and from the stack. They are usually 32-bit transfers. 
PUSH and POP are useful for saving registers during subroutine calls and other similar 


tasks. They are especially common in programs derived from high-level languages 
(such as Pascal) that transfer all parameters on the stack. 


Examples 


1. The following instruction stores the contents of register ECX at the top of the stack: 


PUSH ECX 
2. The following instruction loads register ESI from the top of the stack: 
POP ESI 
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PUSH and POP both update the stack pointer automatically. PUSHAD and POPAD 
are special versions that move all general-purpose registers (see Figure 2-2) to and from 
the stack. 

LEA loads an effective address into an index or base register. That is, it goes through 
the entire calculation specified by an addressing mode. But, instead of using the effec- 
tive address, it simply saves it in a register. This is useful for computing an address 
parameter for a subroutine and for speeding up sequences that use the same effective 
address several times. 


Examples 


1. The following instruction loads register EBX with the contents of ESI plus 8: 
LEA EBX,8[(ES]} 
LEA is thus also useful for doing simple arithmetic, particularly because it does not 
overwrite the original operand. This single instruction is equivalent to 
MOV EBX,ESI 
ADD EBX,8 
2. The following instruction loads register ECX with the sum of EBX, ESI, and 6: 
LEA ECX,6[EBX+ESI] 
Not only does this avoid recalculations, but it also performs a step in sequences for 
multilevel indirect and indexed addressing. 
3. The following instruction loads register EBX with 5 times the contents of EAX: 
LEA EBX,[EAX+4*EAX] 
It uses scaled indexing to do a fast multiply. Of course, the process only works for 
multiplications by 2, 3,4, 5, 8, or 9. 


Frequently Used Arithmetic and Logical Instructions 


The frequently used arithmetic and logical instructions are all straightforward. Note 
the following: 
* Double-operand instructions take the form: 
Operation code destination,source 

The result replaces the destination. For example: 

SUB EAX,EBX 
subtracts the contents of register EBX from register EAX and puts the difference in 
EAX. The order here is what one would expect. 
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¢ CMP acts just like SUB but does not save the result. It affects only the 
flags. 
¢ TEST acts just like a logical AND but does not save the result. It is thus 
the logical version of CMP. 
¢ ADC and SBB include the carry in addition and subtraction, respective- 
ly. The results are: | 
ADC: (dest) = (dest) + (src) + Carry — 
SBB: (dest) = (dest) - (src) - Carry 
« Shifts can specify their counts in any of three ways: 
Implicitly for a single shift. 
Immediate value (1 to 32). 
Value in register CL (lowest 5 bits only). Note that only CL can 
be 
used in this way. 
Typical forms are: 
SHL AL,1 shifts AL left 1 bit position. 
SHL AL, 5 shifts AL left 5 bit positions. 
SHR AX,CL shifts AX right a number of bit positions given by 
the five least significant bits of register CL. 
The 80386 has no explicit clear instruction. However, you can use 
SUB reg, reg 
to clear a register. The result is obviously zero. Many programmers prefer the 
equivalent but more confusing 
XOR _ reg,reg 
To verify that its result is zero, remember that the EXCLUSIVE OR of 2 bits is Oif 
they are equal and 1 otherwise. 
The results of arithmetic and logical instructions can end up in memory as well as 
in registers. For example: 
1. The following instruction logically ANDs the 8-bit number FO hex with the contents 
of address 4000 hex: 
AND BYTE PTR [4000H],OFOH 
2. The following instruction adds register EAX to the double word starting at the ad- 
dress 10 bytes beyond [EBX]: 
ADD 10[EBX],EAX 
3. The following instruction complements the 16-bit number at address 3000 hex: 
NOT WORD PTR [3000H] 
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Frequently Used Program Control Instructions 


The 80386’s program control instructions are generally conventional. Conditional 
jumps can use only relative offsets, which may be 8, 16, or 32 bits long. CALL and 
JMP, on the other hand, can use any addressing mode, including relative. Note that 
jumps work like other instructions (unlike on many other processors, where they act 
as though one level of indirection had been removed). For example, JMP EBX trans- 
fers control to the address in EBX, whereas JMP [EBX] transfers control to the address 
reached indirectly via EBX. 

The set of conditional jumps listed in Table 2-5 applies most often after a comparison 
of unsigned numbers. If op] and op2 are unsigned, the jumps work as follows after 
CMP op1,op2: 

JA (or JNBE) jumpif op! >op2 
JAE(orJNB) jumpif opl 2 op2 
JB (or JNAE) jumpif opl <op2 
JBE (or JNA) jumpif op! <op2 
JE jump if op] = op2 
JNE jump if opl #op2 

The set listed in Table 2-6 applies most often after arithmetic or logical instructions. 

The jumps are: 


JC jump if Carry = 1 

JNC jump if Carry = O 

JNS jump if positive (Sign = QO) 

INZ jump if result not zero (Zero = O) 
JS jump if negative (Sign = 1) 

JZ jump if result zero (Zero = 1) 


Note that some mnemonics are just different names for the same instruction. For ex- 
ample, JB and JC are equivalent, as are JAE and JNC, JE and JZ, and JNE and JNZ. 
The alternative mnemonics serve simply to make programs clearer. The same holds 
for the more obvious JA and JNBE, JAE and JNB, JBE and JNA, and JB and JNAE. 

INT (software interrupt or trap) is a special instruction that makes the processor do 
the following: 


1. Save EFLAGS, the code segment register, and the instruction pointer at the top of 
the stack. EFLAGS 1s pushed first, then CS, and finally EIP. 

2. Jump indirectly via an address determined from INT’s parameter (an integer Icss 
than 256). We will discuss the derivation of the target address later. The target ad- 
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dress and its successors must contain new values for the instruction pointer and code 
segment register. 


INT is used to respond to external interrupts. It also often allows user programs to 


access built-in routines in an operating system such as MS-DOS or in firmware such 
as a BIOS (Basic Input/Output System). For example, on an MS-DOS computer, INT 
16H performs a keyboard operation such as reading a character, reporting whether a 
character is available, or obtaining the status of the shift key. For information on what 
INT instructions do in the IBM PC, see P. Norton, Programmer's Guide to the IBM 
PC, Microsoft Press, Redmond, WA, 1985. 


General Data Transfer Instructions 


Table 2-7 lists all 80386 data transfer instructions. Note the following: 


l. 


XLAT does a simple table lookup. It computes an effective address by adding AL 
to EBX. It then moves a byte from the effective address to AL. AL’s previous value 
is lost, but EBX is unaffected. This highly specialized instruction can handle any 
table with 8-bit elements. 


. There are several conversion instructions. They can do either sign- or zero-extended 


conversions of bytes to words and words to double words. They can also do sign- 
extended conversions of double words to quad words. Conversion instructions are 
particularly useful for extending counters, indexes, and dividends. They are also im- 
portant in compilers for doing type conversions. 


Examples 


le 


The following instruction exchanges the value in AL with the value in the byte ad- 
dressed by register EBP: 

XCHG AL,[EBP] 
XCHG replaces three MOVs, since a temporary holding place would be necessary 
to avoid overwriting a value. That is, the altemmative sequence using register BL 
would be 

MOV BL,AL SSAVE FIRST OPERAND 

MOV ALJ[EBP] ‘MOVE SECOND OPERAND 

MOV — [EBP],BL SMOVE FIRST OPERAND 
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Table 2-5 
Conditional Jumps After an Unsigned Comparison 








Mnemonic Description 
JA Jump if above (CF = O and ZF = 0) 
JAE Jump if above or equal (CF = 0) 
JB Jump if below (CF = 1) 
JBE Jump if below or equal 
(CF= PorZF = 1) 
JE Jump if equal (ZF = 1) 
JNE Jump if not equal (ZF = 0) 
Table 2-6 
Conditional Jumps After Logical or Unsigned Arithmetic 
Instructions 
JC Jump if carry (CF = 1) 
INC Jump if not carry (CF = QO) 
JNS Jump if not sign (SF = Q) 
JINZ Jump if not zero (ZF = 0) 
JS Jump if sign (SF = 1) 
JZ Jump if zero (ZF = 1) 


2. The following instruction extends the value in AL to 32 bits with zeros and puts the 
extended value in EAX: 
MOVZX EAX,AL 
Bits 8 through 31 are all zero. 
3. The following instruction converts the signed double word in EAX into a signed 
quad word in EDX and EAX: 
CDQ 
The more significant double word is in EDX. The extension propagates the most 
significant bit of EAX into all bits of EDX. No, CDQ are not the initials of a complete- 
ly unknown descendant of Johann Sebastian Bach. 
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General Data Manipulation Instructions 


Tables 2-8 through 2-11 list the 80386’s arithmetic, string, logical, and bit manipula- 
tion instructions. String instructions (see Table 2-9), despite the name, are actually con- 
venient for handling any data array, not just character strings. The idea is to make one 
iteration in an array processing sequence into an instruction. We call such instructions 
string primitives, aS they are building blocks for complex string operations. In general, 
the instruction must: 


1. Do an operation, such as loading, storing, comparing, or moving an element. Source 
and destination pointers define the addresses involved. Some operations, such as 
loading and storing, require only one address. Others, such as comparing or moving, 
require two. 

2. Update the pointers to reach the next elements or available addresses. The sign of 
the updating step depends on whether we are moving up (autoincrementing) or down 
(autodecrementing) through the array. The size of the step depends on the size of 
the elements. It is 1 if the elements are 8 bits, 2 if they are 16 bits, and 4 if they are 
32 bits. 


The instruction specifies the size of the step, either through a suffix (B, W, or D) or 
through an operand. The D (direction) flag determines the sign. It is O for 
autoincrementing, 1 for autodecrementing. 

One can make string instructions do even more by prefixing them with REP. It decre- 
ments a counter and repeats the operation if the result is not zero. The precise order of 
the steps is as follows: 


1. Check whether register ECX contains zero and exit if it does. Nothing happens if 
ECX is zero initially. 

2. Do the subsequent string instructions. REP works only with string instructions, not 
with moves or arithmetic or logical instructions. 

3. Subtract 1 from ECX and return to step 1. The decrement does not affect the flags. 


Conditional REPs (REPE, REPNE, REPNZ, or REPZ) repeat the string instruction 
only as long as their condition holds. For example, REPZ repeats only as long as the 
Zero flagis 1. The processor checks the condition after each iteration before decrement- 
ing ECX. 

Note the following about other less frequently used data manipulation instructions: 


Table 2-7 
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Data Transfer Instructions 





Mnemonic Assembler Format OF SF ZF AF PF CF TF IF DF NT RF 


MOV 
MOV 


POP 
POPA 
PUSH 
PUSHA 
XCHG 
XLAT 


CBW 
CDQ 
CWD 
CWDE 


MOVSX 
MOVZX 


IN 
OUT 


LDS 
LEA 


MOV 
MOV 


POP 
POPA 
PUSH 
PUSHA 
XCHG 
XLAT 


CBW 
CDQ 
CWD 
CWDE 
MOVSX 
MOVZX 


IN 
OUT 


LDS 
LEA 


GENERAL-PURPOSE 


dest,Src =) + shih 
control, 


debug U U U U iU 


dest - = = = = 
src ae? a ret he 


dest,src - - = = -& 
source 


table ~ es eS 


CONVERSION 


reg,src > = - -«& 
reg,src Se use ee Ely 


INPUT/OUTPUT 


acc,port - - - - - 
portace - Bis = = 


ADDRESS OBJECT 


reg,src ae ee 
reg,src 2a yoshi 


Flags 
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Table 2-7 (continued) 
Data Transfer Instructions 





Flags 

Mnemonic Assembler Format OF SF ZF AF PF CF TF IF DF NT RF 
LES LES reg,src See Se ae ee 
LFS LFS reg,src oe kn a ae ee ee 
LGS LGS reg,src a ee oe ee. 
LSS LSS reg,src bre aogier. Go Cede. <0 

FLAG MANIPULATION 

CLC CLC eek see retin cma iat Ose eee oe a Hal 
CLD CLD ET dre A | Neon Crp ope Wace OL og ja 
CMC CMC oe ee ee ee ee ee ee 
LAHF LAHF es eed ee ee ee 
POPF POPF R R R R R R R R R RR 
POPFD POPFD R R R R R RR RR R R 
PUSHF PUSHF Se eo 2 ee mee oer ee 
PUSHFD  PUSHFD Et Sd aortas? poe Eide G 
SAHF SAHF Holt oR. RaawReR ee a i or es 
STC STC 4 tthe Se Ei are a 
STD STD ee ee ee a ee 


Key to Codes: instruction clears flag 

= instruction sets flag 

instruction complements (inverts) flag 
instruction restores prior value of flag 
instruction’s effect on flag is undefined 


- = instruction does not affect flag 


I 


QAIdAN-O 
( 


1. DAA and DAS are special instructions for packed decimal addition and subtraction 
(that is, two digits per byte). They apply only to register AL and work only after an 
ADC, ADD, SBB, or SUB instruction. 
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Table 2-8. 
Arithmetic Instructions 
Flags 
Mnemonic Assembler Format OF SF ZF AF PF CF TF IF DF NT RF 
ADDITION 
AAA AAA = = = {ONS eM 
ADC ADC destssrc M M M M M TM 
ADD ADD dest,src M M M M M M 
DAA DAA U M M TM M T™ 
INC INC dest MM MMM - 
SUBTRACTION 
AAS AAS = & AM - MM 
CMP CMP dest,src M M M M M M 
DAS DAS U M M T™ M T™ 
DEC DEC dest MMMM M - 
NEG NEG dest MM M MM M 
SBB SBB dest,src M M M M M TM 
SUB SUB destsrc M M M M M M 
MULTIPLICATION 
AAM AAM - . Mi. M. =" Mh = 
IMUL IMUL acc,Src M U U U U M 
MUL MUL acc,src M U U U UM 
DIVISION 
AAD AAD - MoM oa M *2 
DIV DIV acc,Src UU Ubu Uv vv 
IDIV IDIV acc,Src Ww sel) Sh al uy 
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Table 2-8. (continued) 
Arithmetic Instructions 





Key to Codes: M = _ instruction modifies flag (effect depends 
on operands) 
T = instruction tests flag 
U instruction’s effect on flag is undefined 
- = instruction does not affect flag 


* The string primitives also have common formats with implied operands. These 
use a Suffix of B, W, or D depending on the size of the data transfer (for example, 
CMPSB, LODSW, and SCASD). 


2. AAA, AAD, AAM, and AAS are special instructions for unpacked decimal (ASCII) 
arithmetic (that is, one digit per byte). They apply only to register AX. AA 1s for 
“ASCII adjust,” not to keep these instructions at the head of the alphabetical line. 

3. Bit manipulation instructions (BT, BTC, BTR, and BTS) set the Carry flag from the 
tested bit. The bit number can be either an immediate constant or the contents of a 
gencral-purpose register. 

4, Bit scan instructions (BSF and BSR) look for a 1 bit in the operand. They clear the 
Zero flag if none exists and set it otherwise. If they find a 1 bit, they return its index 
in the destination register. BSF (bit scan forward) starts the scan at bit 0, whereas 
BSR (bit scan reverse) starts the scan at the most significant bit. The MSB is bit 31 
for 32-bit operations or bit 15 for 16-bit operations. 


Examples 


1. DAA. This converts an 8-bit binary sum in AL into a decimal sum, using the Carry 
and Auxiliary Carry flags. It works only after ADC AL or ADD AL. 

2. MOVSD. This moves data from the address in ESI to the onc in EDI. It then up- 
dates both ESI and EDI according to the D flag’s value. The pointers are increased 
if D is O and decreased if it is 1. The step is always sufficient to reach the next ele- 
ment; that is, the step is 4 for MOVSD, 2 for MOVSW, and 1 for MOVSB. MOVS 
provides a memory-to-memory move. The data never occupies a user register. 

3. REP STOSB. This sequence first checks whether ECX 1s O. If itis, nothing happens. 
If not, the processor stores the contents of AL at the address in EDI. It then updates 
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Table 2-9 
String Instructions 





Flags 
Mnemonic Assembler Format* OF SF ZF AF PF CF TF IF DF NT RF 
CMPS CMPS destsrc M MMM MM.4- - T - - 
INS INS des. bX. = pac sr al ase - F - - 
LODS LODS src - = ee ee UT 
MOVS MOVS _ dest,src Aree eee es Ssh ase bh) ea woes = 
OUTS OUTS DX,src oS) avi ee Soe i EF ees 
REP REP b es te TYAS be woes eee 
REPE REPE BE A ee PR ee tee - oe ee ge 
REPNE REPNE eee bf Po ae 2b tripe be So eee oe 
REPNZ REPNZ SpE Pm ae tel ae Slane oie 
REPZ REPZ ao Mae: «arp coca) 2] bsvia eet 
SCAS SCAS dest MM MMMM - - T - - 
STOS STOS dest 3) Mey ee ts es Pete ed Sa eet 


* The string primitives also have common formats with implied operands. 
These use a Suffix of B, W, or D depending on the size of the data transfer (for 
example, CMPSB, LODSW, and SCASD). 


Key to Codes: M == instruction modifies flag (effect depends 
on operands) 
T = instruction tests flag 
U =_ instruction’s effect on flag is undefined 


- = instruction does not affect flag 


EDI according to the D flag’s value. The step is +1 if Dis O, —1 if Dis 1. Finally, 
the processor subtracts 1 from ECX and starts the operation over again. 

4. BTS EAX,4. This instruction sets the Carry flag from bit 4 of register EAX, then 
sets that bit to 1. The other flags are not affected. Note that all bit manipulation in- 
structions test a bit. The differences among them are whether and how they change 
the bit afterward. 
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Table 2-10 
Logical Instructions _ 





Flags 
Mnemonic Assembler Format OF SF ZF AF PF CF TF IF DF NT RF 
LOGICALS 
AND AND dest,src O M M U M O 
NOT NOT dest - = eee 
OR OR destsrc O M M U M QO 
TEST TEST destsrc OM M U M QO 
XOR XOR dest,src O M M U M O 
SHIFTS 
SAL 1 SAL dest, 1 MMM U M M 
SAL count SAL dest,count U M M U M M 
SAR 1 SAR dest, | MMM U M M 
SAR count SAR dest,count U M M U M M 
SHL 1 SHL dest, 1 MMM U M M 
SHL count SHL dest,ccount U M M U M M 
SHLD SHLD dest, 
src.ccount U M M U M M 
SHR 1 SHR dest, 1 MM M U M M 
SHR count SHR dest,countU M M U M M 
SHRD SHRD dest, 
src,count U M M U M M 
ROTATES 
RCL 1 RCL dest,1 M - - - - TM 
RCL count RCL dest,count U - - - - TM 
RCR 1 RCR dest,1 M - - - - TM 
RCR count RCR dest,count U - - - - TM 
ROL 1 ROL dest, 1 M - - - - M 
ROL count ROL dest,count U - - - - M 
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Table 2-10 (continued) 
Logical Instructions 





Flags 
Mnemonic Assembler Format OF SF ZF AF PF CF TF IF DF NT RF 

ROR 1 ROR dest,1 Mt te aN Maar tIVE =e pe. ee 
ROR count ROR dest,count U - - - - M - - - - - 
Key to Codes: M =_§ instruction modifies flag (effect depends on operands) 

T = instruction tests flag 

U = instruction’s effect on flag is undefined 

- = instruction does not affect flag 

Q = instruction clears flag 


Table 2-11 
Bit Manipulation Instructions 





Flags 
Mnemonic Assembler Format OF SF ZF AF PF CF TF IF DF NT RF 
BSF BSF desires WU oI Veale =-1—. =sJb4 2 
BSR BSR deticice "4 WC M U MemU = at =*° = lot 
BT BT dase oliser wo Ur tieWeM - 4¢ -thios 
BTC BTC baseoikset Uw i. Ue eh Weise + Cle 
BTR BTR base, OfisetMi U- Ee ew ll. hl CU 
BTS BTS base,oirset U OU Us WU OMe ~ = el OS 
Key to Codes: M =_ instruction modifies flag (effect depends on operands) 
U =_ instruction’s effect on flag is undefined 


instruction does not affect flag 
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General Program Control Instructions 


Table 2-12 lists all 80386 program control instructions. Table 2-13 describes the con- 
ditional jumps in more detail. Note the following: 


I. 


LOOP subtracts 1 from register ECX, then branches if the result is not zero. It 1s 
thus equivalent to the common DEC ECX, JNE sequence at the end of a loop. Only 
ECX can be used in this manner. Conditional LOOPs also check whether a condi- 
tion holds before branching. That is, a conditional LOOP continues the iterations as 
long as the condition holds and ECX is nonzero. 


. JECXZ jumps if register ECX contains O. It can thus test whether a conditional 


LOOP or REP continued through all its iterations (that is, reduced ECX to zero). A 
JECXZ after the loop will branch if ECX is zero and will continue otherwise. The 
nonzero condition means that the exit occurred before the normal end of the loop 
(that is, because the condition no longer held). The program may then have to com- 
plete the loop before proceeding. The existence of LOOP and JECXZ makes it 
preferable to use ECX as acounter whenever possible. JECXZ can also test whether 
a counter is zero before a loop begins. It can thus provide an immediate exit in case 
the loop should not be executcd at all. 


. The SETcc instructions set a byte to 1 if the condition holds and to O if it does not. 


These instructions convert a condition into a stored value for later testing or use in 
boolean expressions in high-level languages. The conditions have the same 
mnemonics and meanings as described for conditional jumps in Table 2-13. 


Other Instructions 


Tables 2-14 through 2-16 list the other 80386 instructions. The categories are high- 
level language support (Table 2-14), protection model (Table 2-15), and processor con- 
trol (Table 2-16). We will describe protection model instructions in Chapter 5. The 
high-level language instructions provide quick ways to check array bounds and assign 
and delete parameter blocks for procedure entries. The processor control instructions 
include ESC, used for instructions intended for the numerical coprocessor, and LOCK, 
which controls a signal used in multiprocessing applications. 
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Table 2-12 
Program Control Instructions | 





Flags 
Mnemonic Assembler Format OF SF ZF AF PF CF TF IF DF NT RF 

CONDITIONAL TRANSFERS 
JA JA dest - - T T 
JAE JAE dest - - = T 
JB JB dest - - - T 
JBE JBE dest - - YT T 
JC JC dest - - - T 
JCXZ JCXZ dest be ae : 
JE JE dest at T 
JECXZ JECXZ dest See - 
JG JG dest ¥ 7 - 
JGE JGE dest ie Tey xe ~ 
JL JL dest Rs GLP lag - 
JLE JLE dest T T T : 
JNA JNA dest tit T 
JNAE JNAE dest a T 
JNB JNB dest - - - T 
JNBE JNBE dest =o et Ay T 
INC INC dest - = - T 
JNE JNE dest - - T - 
JING JING dest ya - 
JNGE JNGE dest Epos _ 
INL INL dest T T - 2 
JNLE JNLE dest ‘Bon a ok - 
JNO JNO dest Thalia cell at . 
JNP JNP dest ae c 
JNS JNS dest crate ; 
INZ JNZ dest - - T - 
JO JO dest £2 ee : 
JP JP dest a 3 
JPE JPE dest a : 


Table 2-12 (continued) 
Program Control Instructions 





Mnemonic Assembler Format 


JPO 

JS 

JZ 
SETA 
SETAE 
SETB 
SETBE 
SETC 
SETE 
SETG 
SETGE 
SETL 
SETLE 
SETNA 
SETNAE 
SETNB 
SETNBE 
SETNC 
SETNE 
SETNG 
SETNGE 
SETNL 
SETNLE 
SETNO 
SETNP 
SETNS 
SETNZ 
SETO 
SETP 
SETPE 
SETPO 
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JPO 

JS 

JZ 
SETA 
SETAE 
SETB 
SETBE 
SETC 
SETE 
SETG 
SETGE 
SETL 
SETLE 
SETNA 


SETNAE 


SETNB 


SETNBE 


SETNC 
SETNE 
SETNG 


SETNGE 


SETNL 


SETNLE 


SETNO 
SETNP 
SETNS 
SETNZ 
SETO 
SETP 
SETPE 
SETPO 


dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 
dest 


OF SF ZF AF PF CF TF IF DF NT RF 


ae Se ee! a A 


te: 


(sss 


mm or 


J) 


— a | 


Flags 
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Table 2-12 (continued) 


Program Control Instructions 





SETS 
SETZ 


CALL 
JMP 
RET 





Flags 
Mnemonic Assembler Format OF SF ZF AF PF CF TF IF DF NT RF 


SETS dest gee ae oe eee See 
SETZ dest eo 3 SP a, ae ce. Ler 


CALL dest a i  o° date « SEL. 1 
JMP dest Bs SS = Se reese 
RET or RET num* oe Ae. eg. “eek SS See 


* num is the number of bytes popped from stack 


LOOP 
LOOPE 
LOOPNE 
LOOPNZ 
LOOPZ 


CLI 
INT 
INTO 
IRET 
STI 


Key to Codes: 


ITERATION CONTROLS 
LOOP _ dest ~ £7 = wet SS. wae 
LOOPE dest As fT .- oor AMES Ae 
LOOPNE dest oe FP «ae, GAM Ea Eee 
LOOPNZ dest a= TPs cp, “Sar. <i 
LOOPZ dest dows Tea ws) re SW. Bee 
INTERRUPTS 
CLI ~ <9 oS eee ee 
INT inttype = 4 to = sols TOR BEL- Gos 
INTO + 4 °F = =e OM G- Bie 
IRET K R R R RR BR R TV" = 
STI a ee SS ee Pee Pes OL 
R = _ instruction restores prior value of flag 
T= instruction tests flag 
- = instruction does not affect flag 
Q = instruction resets flag 
1 = instruction sets flag 
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ADDRESS AND OPERAND SIZE 


Most 80386 instructions, as we have mentioned, can operate on 8-bit, 16-bit, or 32-bit 
operands. The ways of specif ying lengths are: 

¢ Through register operands 

¢- Through operands derived from assembler variables with specific types 

¢« Through explicit designations (type overrides) such as BYTE PTR, 
WORD PTR, and DWORD PTR 

The differentiation between 16-bit and 32-bit operands and addresses requires more 
explanation. In fact, each code segment has a bit in its definition (or descriptor) that 
determines whether it is a 16-bit or a 32-bit segment. A 16-bit segment uses 16-bit ad- 
dresses and operands unless otherwise specified. A 32-bit segment similarly uses 32- 
bit addresses and operands. The programmer assigns the addrcss and operand size for 
each segment with a USE directive in an assembly language program. 

You can override the segment specification for addresses or operands. That is, you 
can force a 16-bit segment to use 32-bit addresses, 32-bit operands, or both. And vice 
versa for a 32-bit segment. The usual way to do this is by just specif ying the other 
length through a register operand or a PTR operator. 

Note, however, that the assembler actually generates an address override or operand 
override prefix byte. Thus a 32-bit instruction in a 16-bit segment requires 1 or 2 over- 
ride bytes. The same holds for a 16-bit instruction in a 32-bit segment. 

Why use an override? Among the reasons are: 

- To increase the speed or extend the addressing capability of 16-bit 
programs without rewriting them completely. 

¢ To match the bit widths of I/O ports or channels. 

¢- To save memory by using 16-bit arrays or tables rather than 32-bit. 

- To agree with type definitions used in high-level language programs. 
Note that a common reason for using 16-bit data types 1n high-level lan- 
guages (such as C and Pascal) is to save time and memory. Now, on the 
80386, there is actually a penalty (the override bytes) for using 16-bit 
operands in a 32-bit segment. 

¢ To interface correctly with programs written in high-level languages. 

Some uses Of overrides make sense only during the transitional period as 32-bit 
processors succeed 16-bit devices. 

In fact, the 80386 does not have separate 8-, 16-, and 32-bit instructions. It has: 

« §-bit instructions 

¢ Instructions conforming to the segment’s type (16- or 32-bit) 
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Table 2-13 
Conditional Jump Instructions 





Instruction Description Condition (Jump if...) 
JA Jump If Above CF = 0 and ZF = 0 
JAE Jump If Above or Equal CF = 0 
JB Jump If Below CF = ] 

JBE Jump If Below or Equal CF=1 or ZF=1 
JC Jump If Carry OCF=-l 
JCXZ Jump If CX Is Zero (CX) =0 
JE Jump If Equal ZF = 1 
JECXZ Jump If ECX Is Zero (ECX) = 0 
*IG Jump If Greater ZF = 0 and SF = OF 
*JGE Jump If Greater or Equal SF = OF 
*JL Jump If Less SF #OF 
*JLE Jump If Less or Equal ZE =1 orSF+0F 
INA Jump If Not Above CF=1 or ZF= 1 
JNAE Jump If Not Above or Equal CF= 1 
JNB Jump If Not Below CF=0 
JNBE Jump If Not Below or Equal CF = 0 and ZF =0 
INC Jump If No Carry C= 0 
JNE Jump If Not Equal ZF = 0 
*JNG Jump If Not Greater ZF = 1 orSF#OF 
*JNGE Jump If Not Greater or Equal SF # OF 
*JINL Jump If Not Less SF = OF 
*JNLE Jump If Not Less or Equal ZF = 0 and SF = OF 
*JNO Jump If No Overflow OF = 0 
JNP Jump If Parity Odd PF =0 
JNS Jump If Sign Positive SF = 0 
JINZ Jump If Not Zero ZF = 0 
*JO Jump If Overflow OF = 1 
JP Jump If Parity Even PF = ] 
JPE Jump If Parity Even PF = 1 
JPO Jump If Parity Odd PF =0 
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Table 2-13 (continued) 
Conditional Jump Instructions 





Instruction Description Condition (Jump if...) 
JS Jump If Sign Negative oF = | 
JZ Jump If Zero ZF = 1] 


* Used mainly to deal with signed (two’s complement) operands. 
Note 1: The Parity flag is 1 if the parity of a byte is even and 0 if it is odd. 


Note 2: The Sign flag is the most significant bit of the latest result. It is 1 if that 
result was a negative signed number and O if it was a positive signed number. 


Table 2-14 
High-Level Language Instructions 





Flags 
Mnemonic Assembler Format OF SF ZF AF PF CF TF IF DF NT RF 


BOUND BOUND reg,bound - - - - - - - - - = - 
ENTER ENTER _ storage, 


level ee ee ee ee ee ee es 
LEAVE LEAVE ee a ee eS, oe eee 


Key to Codes: - = instruction does not affect flag 


Table 2-15 
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Protection Model Instructions 


Mnemonic Assembler Format 


ARPL 
CLTS 
LAR 
LGDT 
LIDT 
LLDT 
LMSW 
LSL 
LTR 
SGDT 
SIDT 
SLDT 
SMSW 
STR 
VERR 
VERW 


ARPL 
CLTS 
LAR 
LGDT 
LIDT 
LLDT 
LMSW 
LSL 
LTR 
SGDT 
SIDT 
SLDT 
SMSW 
STR 
VERR 
VERW 


sel,reg 


reg,src 
src 

src 

src 

src 
reg,src 
src 
dest 
dest 
dest 
dest 
dest 
Sel 

Sel 





Flags 
OF SF ZF AF PF CF TF IF DF NT RF 


Key to Codes: M =_ instruction modifies flag (effect depends on operands) 
- = instruction does not affect flag 


Table 2-16 
Processor Control Instructions 





Flags 
OF SF ZF AF PF CF TF IF DF NT RF 


Mnemonic Assembler Format 





ESC ESC EF AS es He ge: ye 
HLT HLT a, ee ee ey 
LOCK LOCK pe ee, eee a OP ee ee © 
NOP NOP Slate ta! hee este ta US ee ae ee ge 
WAIT WAIT ee ee ee ee ee 
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¢ Overrides for address or operand length that can be applied to the con- 
forming instructions 


INSTRUCTION SPEEDUPS _ 


Two major reasons for the 80386’s improved performance over earlier processors are 
changes in its multiplication and shifting methods. The multiplication instructions use 
an early-out method. It recognizes when the rest of the multiplier is zero and quits. This 
is like noting that the problems 
3106 x 25 and 3106 x 2500 

are the same except for position. Previous processors, like a child just learning to mul- 
tiply, continued through the zeros. The change increases speed greatly in common 
Situations with small multipliers. 

The 80386 does all shifts in a “barrel shifter.”’ This device simply selects the proper 
Output for each bit position rather than shifting the data one bit position at a time. The 
result is that all shifts take the same amount of time, regardless of how far they move 
the data. Multiple-bit shifts are common in arithmetic, communications, graphics, and 
signal processing applications. 


INSTRUCTIONS AND FLAGS 


80386 instructions have highly individualized effects on the flags. The only way to be 
Sure of what happens is to use a reference such as Appendix B or Tables 2-7 through 
2-16. Remember the following: 
¢ Data transfer instructions such as MOV, IN, and OUT do not affect any 
flags. The common way to set the flags from a register’s contents is with 
TEST  reg,reg 
This does not affect the register. OR reg,reg and AND reg,reg are 
equivalent but not as obvious. 
¢ Logical instructions, for reasons that escape me, always clear Carry ex- 
cept for NOT, which does not affect any flags at all. 
¢ INC and DEC do not affect Carry. This allows their use in loops that do 
multiple-precision arithmetic. The Carry is needed to transfer a bit from 
one iteration to the next. To add or subtract 1 with an effect on Carry, use 
ADD op1,1 or SUB op1,1. 
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- Bit test instructions affect only the Carry flag: 

- Bit scan instructions affect only the Zero flag. 

¢ Rotate instructions (RCL, RCR, ROL, and ROR) affect only the Carry 
and Overflow flags. 

¢ Type conversion and sign extension instructions do not affect the flags. 

¢« The decrementing of counters and updating of pointers in LOOP, REP, 
and string instructions do not affect the flags. CMPS and SCAS affect 
the flags only through the comparison; the other string instructions do 
not affect the flags. 

- NOT (1’s complement) does not affect the flags but NEG (2’s comple- 
ment) does. 

- Divides (DIV, IDIV) do not affect the flags but multiplies MUL, MUL) 
do. 


8086 AND 80286 COMPATIBILITY 


Many software authors want their programs to run on a wide range of hardware. For 
example, most personal computer programs must run on 80386-, 80286-, and 
8086/808 8-based computers (thatis, on advanced PCs, ATs, and standard PCs or across 
the IBM Personal System/2 line). Itis then important to know which instructions work 
on all processors. Table 2-17 lists 80286 instructions that are not available on the 8086 
and 8088 processors. We omit operation codes such as MOVSD and PUSHAD that 
are simply 32-bit versions of existing 16-bit instructions. Table 2-18 lists 80386 in- 
structions that are not available onthe 80286. Obviously, several levels of compatibility 
are possible here: 


1. Programs that must run on 8086- or 8088-based computers cannot use any instruc- 
tions from Tables 2-17 and 2-18. This applies to MS-DOS programs. 

2. Programs that must run on 80286-based computers cannot use any instructions from 
Table 2-18. Such programs may not run on 8086/8088-based computers. This 
applies to OS/2 programs. 


The only new addressing mode in the 80286 or 80386 is the 80386’s scaled index- 
ing. It lets the processor multiply the index by a factor of 2, 4, or 8. Note also that a 
few instructions (IMUL, PUSH, and POP) have an immediate mode on the 80286 and 
80386 that docs not exist on the 8086/8088. 
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There are other minor differences that are rare in practice. Here we will mention 
only those having to do with the ordinary instruction sct rather than with segmentation 
or interrupts. 


1. PUSH SP. This instruction saves the value of the stack pointer before incrementing 
it on the 80286 and 80386 but the value afterward on the 8086. 

2. Shift and rotate counts. The 80286 and 80386 mask these counts to the low-order 5 
bits, whereas the 8086 does not. 

3. Dividing by largest negative number. The 80286 and 80386 can divide by the largest 
negative number, whereas the 8086 produces an exception. 

4. Flags in stack. The setting of the flags in the stack differs slightly. On the 80286 and 
80386, bits 12 through 14 are in use, whereas on the 8086, they are always Is. 


There are also the obvious physical and clocking differences. The 80386 runs faster 
than its predecessors and takes fewer clock cycles to do many instructions. It also has 
new instructions that replace undefined operation codes on the earlier devices. 


ASSEMBLER DIRECTIVES 


Before using 80386 instructions to write actual programs, we must introduce a few as- 
sembler directives or pseudo-operations. These identify procedures and assign places 
in memory to fixed data, instructions, and storage areas. We will not deal with direc- 
tives that control segmentation or implement advanced features such as macros and 
conditional assembly. We have used notation from Microsoft’s Macro Assembler. 
Most other assemblers are similar. 
The common directives are: 

DB Define byte 

DD Define double word 

DQ Define quad word 

DT Define 10 bytes (for use with floating point numbers) 

DW Define word 

END _~ End of program 

EQU Equate, define symbolic name 
These are all standard operations. The data definition directives may specify an un- 
defined initial value with the ? notation. This is useful for temporary storage rather than 
fixed data. 


81 
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Table 2-17 


80286 Instructions Not Available on 8086/8088 Processors 


Operation Code 


ARPL 

BOUND 

CLTS 

ENTER 

IMUL r/m,immsg 
INS 

LAR 

LEAVE 

LGDT 

LIDT 

LLDT 

LMSW 

LSL 

LTR 

OUTS 

POPA 

PUSHA 

PUSH IMMEDIATE 
RCL r/m,immsg 


RCR r/m,immsg 


ROL r/m,immsg 
ROR r/m,imm8g 
SAL r/fm,imm8s 


SAR r/m,imms 


SHL r/m,imm8g 
SHR r/m,immsg 
SGDT 

SIDT 


Meaning 


Adjust requested privilege level 

Detect value out of range 

Clear task switched flag 

Enter procedure 

Immediate signed multiply 

Input string 

Load access rights 

Leave procedure 

Load global descriptor table register 

Load interrupt descriptor table register 

Load local descriptor table register 

Load machine status word 

Load segment limit 

Load task register 

Output string 

Load all user registers from stack 

Store all user registers on stack 

Store constant on stack 

Rotate left through carry by 
immediate count 

Rotate right through carry by 
immediate count 

Rotate left by immediate count 

Rotate right by immediate count 

Shift left arithmetic by immediate 
count 

Shift right arithmetic by immediate 
count 

Shift left logical by immediate count 

Shift right logical by immediate count 

Store global descriptor table register 

Store interrupt descriptor table register 
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Table 2-17 (continued) 
80286 Instructions Not Available on 8086/8088 Processors 


Operation Code Meaning 

SLDT Store local desscriptor table register 
SMSW Store machine status word 

STR Store task register 

VERR Verify read access 

VERW Verify write access 

Table 2-18 


80386 Instructrions Not Available on the 80386 Processor 


Operation Code Meaning 
BSF Bit scan forward (look for first 1 bit) 
BSR Bit scan reverse (look for first 1 bit) 
BT Bit test 
BTC Bit test and complement 
BTR Bit test and reset (clear) 
BTS Bit test and set 
CDQ Convert dword to qword 
CWDE Convert word to dword sign extended 
LFS Load pointer in F segment register 
LGS Load pointer into G segment register 
LSS Load pointer into S (stack) 
segment register 
MOVSX Move byte to word or dword (or 
word to dword) sign extended 
MOVZX Move byte to word or dword (or 
word to dword) zero extended 
SET cc Set byte from condition code 
SHLD Shift double left 
SHRD Shift double right 
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We will also use the following operators besides the standard arithmetic symbols: 
DUP Duplicate (with data definition directives) 
OFFSET Offset, i.e., the number of bytes between an item and the 
beginning of the segment in which it is defined 
PTR Specify type 
The following are typical directives. Note that labels attached to them are not fol- 
lowed by a colon (for some unknown reason). 


1. SCALE is the address of a byte in memory. Its initial value is 3: 
SCALE DB 3 
2. ERMSG 1s the address of the first byte of a memory area containing the ASCII 
characters for ERROR: 
ERMSG DB _~— ’ERROR’ 
3. COUNT is the address of the low byte of amemory area containing the hexadecimal 
number 3000. The high byte of the number is in address COUNT+1: 
COUNT DW _  § 3000H 
4. SUBTBL is the address of the low byte of the first of four double words with un- 
defined initial values. This creates a 16-byte temporary storage area: 
SUBTBL DD 4DUP(?) 
5. The value of IRATE is the number 7: 
IRATE EQU 7 


Note that you must precede addresses with an OFFSET operator to refer to their 
values within a segment. For example: 
MOV EBX,OFFSET BASE 
This instruction loads register EBX with the offset of address BASE from the begin 
ning of its segment. 





SUMMARY 





~ The 80386 has many general-purpose user registers. They include an accumulator 
EAX, a base register EBX, a count register ECX, a data register EDX, index registers 
EDI and ESI, a base pointer EBP, and a stack pointer ESP. Some registers (EAX, EBX, 
ECX, and EDX) are byte addressable, and all are word addressable to maintain com- 
patibility with previous processors (8088, 8086, and 80286). The 80386 also has a flag 
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or status register and several special-purpose registers intended mainly for operating 
system use. 

The 80836 can handle many different data types, including bits, bit fields, integers, 
decimal numbers, ASCII characters, and floating point and real numbers. Integers, the 
primary data type, can be 8 bits (bytes), 16 bits (words), or 32 bits (double words). 
Most instructions operate on integers. 

The 80836 also has a wide variety of addressing modes. It allows register, im- 
mediate, direct, register indirect, and modes built up from combinations of bases, in- 
dexes, and displacements. One can also multiply the index by a scale factor that can 
be 2, 4, or 8. 

The 80836’s instruction set includes the usual data transfer, arithmetic, logical, 
program control, and status manipulation instructions. It also has conversion, bit 
manipulation, string, iteration control, and high-level language support instructions, as 
well as instructions specifically intended for protected multitasking operating systems. 
The major peculiarities are: 


1. Moves take the form 
MOV _ destination,source 
in which the destination comes first. 


2. Arithmetic and logical instructions take the form 
Operation code destination,source 


The destination is the primary operand (the minuend in subtraction and comparison). 
It is also the place where the result is stored. 
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“/sn’t that lovely?” she sighed. “It’s my favorite 
program — fiffeen minutes of silence — and after that 
there’s a half hour of quiet and then an interlude 
Of lull.” 


Norton Juster, The Phantom Tollbooth 


When in doubt, use brute force. 
Ken Thompson 


Noboay should be allowed to program in assembly language. 
Jim Isaak, Computer Design, Jan. 1, 1987. 


This chapter describes assembly language programming for the 80386. With minor ex- 
ceptions, this is the same as programming the 8086 and 80286. We start with simple 
programs and proceed through bit manipulation, shifts, decision making, array process- 
ing, table lookup, string manipulation, arithmetic, and data structure manipulation. 
Final sections discuss parameter passing methods, common programming errors, and 
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ways to make programs run faster. The emphasis here is on applications programming 
rather than on systems programming. 





80386 HIGHLIGHTS 





For the applications programmer, the 80386 has the following major new features: 
¢ Bit manipulation instructions. They replace the logical instructions and 
shifts previously used for this purpose. The 80386 has bit scans as well 
as bit complement, reset, set, and test. 

- Double-length shifts. SHLD and SHRD can rapidly shift unaligned data 
that occupies several memory locations or registers. The idea is to move 
bits from one unit to the next in a single operation. 

- Any user register can be an index register (except ESP) ora base pointer. 
Compare this to previous processors in which only DI or SI could be 
index registers and only BP or BX could be base pointers. For example, 
onthe 80386, youcanuse ECX orEAX as index registers or base pointers 
in addressing. This may make some register transfers and saving and res- 
toring operations unnecessary. Be careful — the old restrictions still 
apply to 16-bit operations. Note, however, that many registers still have 
Special uses, such as: 

EAX for I/O data, multiplication and division, extension, table 
lookup, packed and unpacked decimal operations, and data in 
string instructions 
EBX for table lookup 
ECX as a loop, shift, or bit position counter 
EDX for I/O addresses, multiplication, and division 
ESI and EDI for pointers in string instructions 
Thus you must still allocate registers carefully. Although EAX, ECX, 
and EDX may be usable as index registers or base pointers, they may 
not be available in practice. 

- Scaled index addressing for handling arrays with multibyte elements. 
This mode saves a multiplication or shift when accessing elements that 
are 2,4, or 8 bytes long. 

The 80386 also shifts and multiplies much faster than earlier processors. A shift’s 
execution time is independent of the number of bit positions shifted. Multiplication is 
much faster because of the early-out algorithm that recognizes when the remaining 
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multiplier is zero. The speedup makes the upper parts of registers readily available and 
reduces the advantage of using shifts and additions instead of multiplications. 

Another change in the 80386 is the virtual elimination of the time penalty for ad- 
dress calculations. Increased pipelining means that such calculations occur in parallel 
with instruction fetch, instruction execution, and memory addressing. Programmers 
can therefore use complex addressing modes (including scaled indexing) freely. 

The 80386 also has other new instructions that are occasionally useful (see Table 2- 
14). For example, the conversion instructions CDQ, CWDE, MOVSX, and MOVZX 
do both sign and zero extension. SETcc (Set on condition code) is handy for creating 
8-bit Boolean variables used in high-level languages such as C and Pascal. 


SIMPLE PROGRAMS 


Simple 80386 assembly language programs use the following features of the 
processor’s instruction Set: 
¢ Moves can transfer data to or from a location defined by any addressing 
mode. The other operand must be either a register or immediate data. 
¢ Logical and arithmetic instructions (addition, AND, comparison, EX- 
CLUSIVE OR, OR, and subtraction) also must have one operand that is 
either a register or immediate data. The destination may be either a 
register or amemory location. 
e¢ Aninstruction may work on8, 16,or 32 bits. Its length is set by the length 
of a source or destination register, by the type of a variable, or by a type 
override (BYTE PTR, WORD PTR, or DWORD PTR). 


Examples 


1. Logically OR registers AL and BL: 
OR AL,BL 
This is an 8-bit operation because AL and BL are 8-bit registers. The result ends up 
in AL. 
2. Add register EBX to register EAX: 
ADD EAX,EBX 
This is a 32-bit operation because EAX and EBX are 32-bit registers. The sum ends 
up in EAX. 
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3. Logically AND register AL with the binary constant BICON: 


4. 


5 


ON 


~ 


OO 


AND  AL,BICON 
BICON must be an 8-bit constant. Immediate addressing is the default mode. No 
Special operation code or designator is necessary. TEST is the same as AND except 
that it does not change the destination. 
Logically OR register AL with the data at the address in register EBX: 

OR AL,[EBX] | 
The brackets around EBX indicate that it contains an address, not data. 
Add contents of memory locations OPER1 and OPER2, put sum in memory loca- 
tion SUM: 


MOV AL,[OPER1] ;SGET FIRST OPERAND 
ADD AL,[OPER2] ;ADD SECOND OPERAND 
MOV [SUM]AL ;SA VE SUM 


There are no memory-to-memory operations. 
We can also use displacements from a base address. That is, 


MOV EBX,OPERI ‘POINT TO BASE ADDRESS 
MOV AL,[EBX] ;GET FIRST OPERAND 
ADD AL,OPER2-OPER1(EBX] ;ADD SECOND OPERAND 
MOV SUM-OPERI[EBX],AL SSAVE SUM 


This is advantageous only if the locations are close together or if the entire data area 
could be moved as a unit. 


. Add aconstant (VALUE) to the double word at address OPER: 


ADD DWORD PTR VALUE,JOPER] 
As neither operand is a register, we need DWORD PTR to indicate a 32-bit opera- 
tion. Note that results can go directly into memory. 


You can use INC and DEC to add and subtract 1. They can work on memory direct- 
ly, but you need a typed variable or PTR to indicate the operation’s length. Remem- 
ber that neither INC nor DEC affects Carry. 


. Subtract 1 from the contents of the memory location 5 bytes beyond the address in 


EBX: 
DEC BYTE PTR S[EBX] 


. Add 1 to the double word at location ADDR: 


INC DWORD PTR [ADDR] 
The only way to recognize a Carry is by examining the Zero flag. 
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BIT MANIPULATION 


The 80386 has special bit test, set, clear (reset), and complement (invert) instructions. 
They all first set the Carry flag from the bit value. The other flags are not affected. The 
bit number can be an immediate value or the contents of a register. Bit manipulation 
instructions apply only to 16- or 32-bit operands. There are no special 8-bit forms. The 
80386 also has instructions for finding the first 1 bit in a word or double word. 


Examples 


1. Set bit 6 of register EDX: 


BTS EDX,6 
2. Clear bit 3 of register EAX: 
BTR EAX,3 


The clear instruction is BTR (R for “reset’). 
3. Invert (complement) bit 14 of the double word starting at location ADDR: 
BTC [ADDR],14 
C is for “complement,” not “clear.” Bit manipulation instructions can operate on 
words or double words in memory as well as on 16-bit or 32-bit registers. 
4. Test bit 5 of register EAX. That is, set the Carry flag if bit 5 is 1, and clear it if bit 
5 1s 0: 
BT EAX,5 
Note that all bit manipulation instructions do a test. Thus they all change the Carry 
as well. Watch for this when using BTC, BTR, or BTS. 


You can use logical instructions to do more complex bit manipulation (Boolean) 
operations as follows: 
¢ To set bits, logically OR them with 1s in the required positions. 
- Toclear bits, logically AND them with Os in the required positions. 
¢ To complement (invert) bits, logically EXCLUSIVE OR them with Is 
in the required positions. 
¢ Totest bits (for all Os), logically AND them with 1s in the required posi- 
tions. 
This approach lets you change or test several bits with one instruction. Of course, the 
changes must all be the same type. That is, you cannot set some bits and clear others. 
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Examples 


1. 


Set bits 2 and 3 of register AL: 

OR AL,00001100B 
This single instruction replaces two BTS instructions. Besides, logical instructions 
can operate on bytes as well as on words and double words. 


. Clear bits 13, 14, and 15 of register ECX: 


AND ECX,OFFFF]FFFH 
This replaces three BTR instructions. Note that it changes all flags, whereas BTR 
affects only Carry. 


. Invert bits 1, 5, and 7 of the byte at address ADDR: 


XOR BYTEPTR [ADDR],10100010B 
This replaces three BTC instructions. Note that the bit positions can be scattered 
anywhere. XOR can also determine the bit positions in which two numbers differ. 
For example, 

XOR AL,BL 
results in a 1 in each bit position in which AL and BL differ. 


A handy shortcut to changing bit O of a register or memory location is to use INC 


or DEC. INC sets a bit if you know that it is cleared; DEC clears a bit if you know that 
itis set. Also either complements bit 0 if you are not using the other bits. This approach 
works if you have used SETcc to give the location a value of Oor 1. 


SHIFT OPERATIONS 


The 80386 has a wide variety of shift instructions. They can work on any register or 
memory location. The number of bits to be shifted can be either an immediate constant 
or the contents of register CL. The instructions are: 


RCL (rotate left through carry) 
RCR (rotate right through carry) 
ROL (rotate left) 

ROR (rotate right) 

SAR (shift right arithmetic) 
SHL (shift left logical) 

SHLD  (double-precision shift left) 
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Original contents of Carry flag and register or memory location 





After RCL (rotate left through Carry) 


Carry 





Figure 3-1 
The RCL (rotate left through carry) instruction in its byte-length 
form. 


Original contents of Carry flag and register or memory location 


Carry Data 


fe] aje fe} ele] a |e 


After RCR (rotate right through Carry) 


Carry Data 





Figure 3-2 
The RCR (rotate right through carry) instruction in its byte-length 
form. 

Original contents of Carry flag and register or memory location 


Carry Data 





After ROL (rotate left) 


Carry Data 


je [e [ofa fee] ee 
Figure 3-3 


The ROL (rotate left) instruction in its byte-length form. 
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Original contents of Carry flag and register or memory location 





Figure 3-4 
The ROR (rotate right) instruction in its byte-length form. 


SHR (shift right logical) 
SHRD_ (double-precision shift right) 
RCL and RCR rotate a register or memory location and the 
Figures 3-1 and 3-2 show how this works in an 8-bit case. RC 


» . . - a el 
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arr 
ag as at the other end. Figures 3-3 and 3-4 illustrate this. SHL (or SAL) and 
SHR are logical shifts Il the vacated bits with Os (see Figures 3-5 and 3-6). SAR 
copies the sign bit to the right (sign extension) as shown in Figure 3-7. Note that RCL 
and RCR preserve the Carry flag (in a data bit), whereas the other shifts destroy It. 

The double-length shifts SHLD and SHRD (new with the 80386) have the follow- 
ing 32-bit forms: 

SHLD  r/m32, r32, 1mmég 

SHLD 1/m32, r32, CL 
The first operand is the register or memory location to be shifted, the second operand 
contains the bits to be shifted in (starting with bit 31), and the third operand 1s the shift 
count. The second operand (the source register) is not changed. 

SHLD and SHRD can shift multiword operands over many bit positions. For ex- 
ample, say we have an operand stored starting with its low byte at location ADDR. Its 
length in double words is COUNT, and we want to shift it left logically 4 bits. The goal 
is to move 4 bits quickly from one double word to the next. 

The procedure 1s as follows: 

1. Move the low double word to a register. Call it the previous double word. 
2. Shift the original low double word left logically 4 bits. The incoming bits are all 
zeros here. 
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Original contents of Carry flag and register or memory location 





After SHL (shift logical left) 





Figure 3-5 
The SHL (shift logical left) or SAL (shift arithmetic left) instruction 
in its byte-length form. 


Original contents of Carry flag and register or memory location 





After SHR (shift logical right) 


Figure 3-6 


The SHR (shift logical right) instruction in its byte-length form. 





Original contents of Carry flag and register or memory location 





After SAR (shift arithmetic right) 





Figure 3-7 
The SAR (shift arithmetic right) instruction in its byte-length form. 
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3. Move tne next higher double word to a register. We must save it before shifting it 
for use in the next shift. 

4. Use SHLD to shift the original next higher double word left 4 bits, deriving the in- 
coming bits from the previous double word. 

5. Move the original next higher double word to the register that held the previous 
double word. 

6. Repeat steps 3 through 5 until the entire operand has been shifted. 


SHLD’s value here is that it lets us shift several bits from the previous double word 
into the current double word. The program is 
MOV ESI,ADDR sPOINT TO LOW DOUBLE WORD 
MOV ECX,COUNT-1;GET NUMBER OF DOUBLE WORDS 


>; AFTER FIRST 
CLD SSELECT AUTOINCREMENTING 
LODSD ;GET LOW DOUBLE WORD 


SHL -4[ESI),4 SSHIFT LOW DOUBLE WORD 
ROTDW: MOV EBX,EAX SAVE PREVIOUS DOUBLE WORD 
LODSD ;GET CURRENT DOUBLE WORD 
SHLD -4[ESI],EBX,4;ROTATE CURRENT DOUBLE WORD 
LOOP ROTDW ;COUNT DOUBLE WORDS 
This kind of operation is useful in performing multiple-precision arithmetic, moving 
graphics figures, and examining bit patterns for communications applications. 


~ MAKING DECISIONS 


The major types of decisions in programs are: 

¢ Deciding whether a bit is set or cleared 

¢ Deciding whether two values are equal 

¢ Deciding whether one value is greater than or less than another 

The first type of decision lets the processor test the value of a flag, switch, status 

line, or other binary (ON/OFF) input. The second type lets the processor check whether 
an input or a result has a specific value. For example, it may want to know whether a 
keyboard inputis a specific character or whether a result is O. The third type of decision 
lets the processor determine whether a value is valid or is above or below a threshold, 
wamning level, or set point. 
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Examples 


1. Branch to DEST if AL contains the number VALUE: 
CMP £AL,VALUE sDOES AL CONTAIN VALUE? 
JE DEST ;>YES, BRANCH 
VALUE Is a number as it has no special designator. 
2. Branch to DEST if AL’s contents are not the same as those of memory location 
ADDR: 


CMP AL,[ADDR] ‘IS AL SAME AS DATA IN MEMORY? 
JNE DEST "NO, BRANCH 
3. Branch to DEST if EAX contains 0: 
TEST EAX,EAX >SET FLAGS FROM EAX 
JZ DEST ‘BRANCH IF EAX CONTAINS ZERO 


Zero 1S a Special value. No comparison is necessary. OR EAX,EAX and AND 
EAX,EAX have the same effect as TEST EAX,EAX but are not as obvious. 
4. Branch to DEST if ECX does not contain —1 (FFFFFFFF hex): 
INC ECX -ESTABLISH ZERO FLAG 
JINZ DEST -BRANCH IF ECX WAS NOT -1 
+1 and —1 are also special values. Remember that INC does not affect Carry. 
5. Branch to DEST if EBX contains 1: 


DEC EBX -ESTABLISH ZERO FLAG 
JZ DEST sBRANCH IF EBX WAS 1 
6. Branch to DEST if memory location ADDR contains 0: 
MOV AL,[ADDR] ;GET VALUE 
TEST AL,AL SET FLAGS 
JZ DEST sBRANCH IF VALUE WAS 0 


MOV does not affect the flags. An alternative that does not use a register is 
INC BYTE PTR[ADDR] ;ESTABLISH ZERO FLAG IN 
DEC BYTE PTR[ADDR] ; TWOSTEPS 
JZ DEST sBRANCH IF VALUE WAS 0 
ADDR’s contents are unchanged. 
7. Branch to DEST if the double word at address ADDR contains the value VAL32: 
CMP  DWORD PTR [ADDR],VAL32 
JE DEST 
The destination can bein memory as long as the source is either a register or an im- 
mediate value. Of course, you must indicate the amount of data being handled with 
a typed variable or a PTR operator. 
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The way to determnine how two operands compare in magnitude is to use CMP. If, 
as is usual, the operands are unsigned, the Carry flag indicates which 1s larger. Its value 
1s 

¢ ifthe source is larger than the destination (that is, a borrow is 
necessary) 
¢ Qif the source is less than or equal to the destination 


The unsigned conditional jumps are: 

JC (JB) — jump if carry (below), jump if the destination is less than the source 

JBE (JNA) — jump if below or equal (not above), jump if the destination is less 
than or equal to the source 

JNC (JNB) — jump if no carry (not below), jump if the destination is greater than 
or equal to the source 

JA (JNBE) — jump if above (not below or equal), jump if the destination is greater 
than the source 


Examples 


1. Jump to DEST if AL’s contents are greater than or equal to the number VALUE: 
CMP # £AL,VALUE IS AL ABOVE VALUE? 
JAE DEST ;>YES, BRANCH 
The jump occurs if no borrow is necessary. 
2. Jump to DEST if the double word at address OPER1 1s less than the one at address 
OPER2?: 


MOV EAX,[OPER]] ;SGET FIRST OPERAND 
CMP = EAX,[OPER2] SIS SECOND OPERAND GREATER? 
JB DEST YES, BRANCH 


We can also use the mnemonic JB. It is equivalent to JC but more descriptive. Note 
that CMP cannot compare two direct addresses. 

3. Jump to DEST if the double word at address OPER! is less than or equal to the one 
at address OPER?: 


MOV EAX,[OPER1] SGET FIRST OPERAND 
CMP EAX,[OPER2] SIS SECOND OPERAND GREATER 
; OR SAME? 
JBE DEST ;>YES, BRANCH 
4. Jump to DEST if the contents of register EDI are greater than or equal to VAL32: 
CMP ~~ EDI,VAL32 IS EDI ABOVE VAL32? 


80386 Programming Guide 


JAE DEST SYES, BRANCH 
CMP can use any register, but there are often special short formns forthe accumulator. 


With signed operands, we must account for 2’s complement overflow. This is the 
case in which the difference between the operands affects the sign bit. That is, the result 
is outside the signed range for the specified number of bits. 

If overflow is possible, we must use signed conditional jumps. These are: 

JG (JNLE) — jump if greater, jump if destination is greater than source in the signed 
sense 

JGE (JNL) — jump if greater or equal, jump if destination is greater than or equal 
to source in the signed sense 

JLE (JNG) — jump if less or equal, jump if destination is less than or equal to source 
in the signed sense 

JL (JNGE) — jump if less, jump if destination is less than source in the signed sense 
These instructions test for overflow automatically. 


Examples 


1. Jump to DEST if AL’s signed contents are greater than or equal to the signed num- 
ber VALUE: 
CMP AL,VALUE IS AL ABOVE VALUE? 
JGE DEST >YES, BRANCH 
2. Jump to DEST if the signed double word at address OPER1 is less than the one at 
address OPER2: 


MOV EAX,[OPERI]] ;SGET FIRST OPERAND 
CMP EAX,[OPER2] IS SECOND OPERAND GREATER? 
JL DEST >YES, BRANCH 


3. Jump to DEST if the signed double word at address OPER] is less than or equal to 
the one at address OPER2: 


MOV EAX,[OPER]1] *>GET FIRST OPERAND 

CMP EAX,[OPER2] ‘IS SECOND OPERAND GREATER 
* OR SAME? 

JLE DEST “YES, BRANCH 
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See 


The simplest way to repeat a sequence of instructions with the 80386 is as follows: 


1. Load register ECX with the number of repetitions. 

2. Do the sequence. 

3. Use the LOOP instruction to subtract 1 from ECX and retum to step 2 if the result 
1S non-zero. 


LOOP is handy because it combines a decrement and a conditional jump. Note that 
it always uses register ECX. The LOOPZ (LOOPE) and LOOPNZ (LOOPNE) varia- 
tions also exit if their condition is not satisfied. 

Typical programs have the following structure: 

MOV ECX,NTIMES;NTIMES = NUMBER OF ITERATIONS 

START: Instructions to be repeated 

LOOP START ;COUNT ITERATIONS 
You can put JECXZ EXIT after the MOV to exitimmediately if NTIMES is zero. This 
eliminates a potential source of error at a small cost. 

By default, LOOP requires an 8-bit relative offset. However, you can use DWORD 
to override the default and allow a 16-bit relative offset. The instruction would then be 

LOOP DWORD START ;COUNT ITERATIONS 

We can, of course, use other registers for counting or count up rather than down. 
These altematives require different initializations, explicit INC or DEC instructions, 
and a JNZ at the end. In any case, the instructions to be repeated must not interfere with 
counting the iterations. Note that register ECX is special, and most programmers 
reserve it aS a loop counter. 


ARRAY MANIPULATION 


The easiest way to access a particular element of an array is by placing its address in 
an index or base register. In this way, we can: 
¢« Manipulate the element by referring to it indirectly, that is, as [reg]. 
¢« Access nearby elements with appropriate displacements, that is, as 
disp[reg]. 
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Examples 


I. 


Add an 8-bit element of an array to AL. Assume that the element’s address is in 
register EBX. Add 1 to EBX afterward so that it contains the address of the next 8- 
bit element: 

ADD AL,J[EBX] ;ADD CURRENT ELEMENT 

INC EBX ‘POINT TO NEXT ELEMENT 
The procedure is the same for 16-bit and 32-bit elements, except that more INCs or 
an ADD are necessary. Note that the 80386 has no explicit autoincrementing or 
autodecrementing. 


. Load EAX with the thirty-fifth element from an array of double words. Assume that 


the array’s base address is in register EBX: 

MOV _ ESI,35 ;GET OFFSET FOR ELEMENT 

MOV EAX,[EBX+4*ESI] j;QOBTAIN ELEMENT 
Scaled indexing 1s useful for 16-, 32-, and 64-bit elements. Remember that the scal- 
ing factor can only be 2, 4, or 8. 


. Exchange anelement of an array withits successor 1f the two are not already in des- 


cending order. Assume that the elements are 8-bit unsigned numbers and that the 
address of the current element is in register EBX. Make EBX contain the address of 
the successor element afterward: 

MOV ALJEBX] ;GET CURRENT ELEMENT 

CMP AL,I[EBX] ;COMPARE TO SUCCESSOR 

JAE DONE ;DONE IF IN ORDER 

XCHG AL,I[EBX] ;REPLACE SUCCESSOR 

MOV [EBX],AL  ;REPLACE CURRENT ELEMENT 

DONE: INC EBX ‘UPDATE POINTER 


This type of operation is useful in exchange sorts. 


An important task in array manipulation is bounds checking. It involves determin- 


ing whether an index is valid, that is, above or equal to a lower bound and below or 


equal to an upper bound. Bounds checking ensures that the processor only reads or 


writes valid elements. Otherwise, an array operation could refer accidentally to unre- 


lated data. 


The 80386 has a special instruction BOUND for bounds checking. Its form is 
BOUND r32,m32 


The result is a system jump (called a trap and discussed in Chapter 7) if r32 < [m32] 
or r32 > [m32+4]. That is, a special event (or exception) occurs if the index is below 
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the lower bound or above the upper bound. The lower bound is at the specified address 
and the upper bound comes immediately afterward. The trap is interrupt 5 (See Chap- 
ter 7). Note that the index and the bounds can be in any units (bytes, words, or double 
words). 

In practice, designers usually put the bounds just ahead of the array itself. This in- 
creases the array’s storage requirements by 8 bytes. 


TABLE LOOKUP 


If the table’s position in memory is fixed, we can access it by using its base address as 
a displacement. The element number then goes in an index or base register. Scaled in- 
dexing lets us access tables with multibyte elements. 

If the table’s position is not fixed, its base address should go in a base register and 
the element number in an index register. The special instruction XLAT (or XLATB) 
can access a table with 8-bit indexes and elements. 


Examples 


1. Load register AL with anelement from atable. Assume that the table starts at BASE 
and the index is in memory location INDEX: 


MOV AL,[INDEX] ‘GET 8-BIT INDEX 
MOV EBX,BASE ‘SGET BASE ADDRESS 
XLAT ;SGET 8-BIT ELEMENT 


XLAT (translate) adds AL and EBX, then uses the sum as the address from which 
to load AL. This highly specialized instruction thus does a table lookup, assuming 
an 8-bit index and 8-bit elements. EBX is unaffected, so it can be used again later, 
but the index in AL is destroyed. Note that we do not have to extend AL to 32 bits 
explicitly. 
2. Load EAX with an element from a table. Assume that its base address is BASE (a 
constant) and the index is in locations INDEX through INDEX+3: 
MOV _ ESI,[INDEX] ‘GET 32-BIT INDEX 
MOV EAX,BASE[ESI*4] — ;GET 32-BIT ELEMENT 
Scaled indexing is convenient for handling multibyte elements. 
3. Load EBX with an element from a table. Assume that its base address is in locations 
BASE through BASE+3 and the index is in locations INDEX through INDEX+3: 
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MOV EBX,[BASE]} ;GET BASE ADDRESS 

MOV _—_ ESI,[INDEX] ;SGET 32-BIT INDEX 

MOV EBX,[EBX+ESI*4] ;GET 32-BIT ELEMENT 
Here we use a base register, an index register, and scaled indexing. The displace- 
ment is zero. 


Jump tables require extra caution. On the 80386, jumps work like other instructions 
when using based and indexed addressing. That is, the destination is the contents of 
the effective address, not the effective address itself. The approach here is different 
from that used on many other processors, such as the Motorola 68000. 


Example 
Transfer control (jump) to a 32-bit address obtained from a table. Assume that the base 


address of the table is BASE (a constant) and the index is in locations INDEX through 
INDEX+3: 


MOV ~— ESI,[INDEX] ;<GET INDEX 
JMP BASE[ESI*4] sSJUMP INDIRECTLY TO 
; DESTINATION 


Be careful here. You might think that the destination is the address BASE + ESI*4. 
Itisn't. The destination is the contents of that address. Note, forexample, the difference 
between JMP EBX, which jumps to the address in EBX, and JMP [EBX], which jumps 
to the address at the address in EBX (that is, indirectly through EBX). 

The common uses of jump tables are to implement CASE or SWITCH statements 
(multiway branches in high-level languages such as Ada, C, FORTRAN, Pascal, and 
PL/I), to decode commands from a keyboard, and to respond to function keys on a ter- 
minal. Other uses of jump tables arein selecting I/O drivers, graphics functions, decod- 
ing routines, or mathematical methods. 


CHARACTER MANIPULATION 


The 80386 can manipulate characters as unsigned 8-bit numbers. The letters and digits 
form ordered subsequences of the ASCII character set. 
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Examples 


l. 


Jump to address DEST if AL contains ASCII E: 


CMP AL,’E’ IS DATA ASCII E? 
JE DEST ;>YES, BRANCH 
. Load register AL with the next character in a string. Assume that the address of the 
current character is in register ESI. Add 1 to ESI afterward: 
MOV AL,[ESI] SGET NEXT CHARACTER 
INC ESI SMOVE POINTER 


. If register AL contains a nonblank character, store it at the address in EDI and add 
1 to EDI afterward: 
CMP AL,’ °’ IS CHARACTER A BLANK? 
JE NEXTCH 
MOV [EDIJ],AL sNO, SAVE IT IN STRING 
INC EDI SMOVE POINTER 


NEXTCH: , NOP 


. Jump to address DEST if register AL contains a letter between A and F inclusive: 
CMP AL,’A’ IS DATA BELOW A? 
JB DONE ;>YES, DONE 
CMP AL,’F’ IS DATA A THROUGH F’? 
JBE DEST ;>YES, BRANCH 
DONE: NOP ;COME HERE IF NOT A...F 


The letters A through F fonn an ordered subsequence in ASCII (41 through 46 hex). 
These instructions can validate a hex letter digit. 


You can often process strings or arrays faster by using string instructions. As 


described in Chapter 2, they combine a string operation with the updating of ESI, EDI, 
or both. The update’s sign depends on the D flag (D = O for autoincrement or 1 for 
autodecrement). 


The 80386 string primitives are: 
CMPS — compare strings. Subtract the operand addressed via EDI from the one 


addressed via ESI and set the flags accordingly. Neither operand changes. 


INS — input string. Load data from an input port (addressed via DX) into the 


memory location addressed via EDI. 


LODS — load string. Load the accumulator from the memory location addressed 


via ESI. 


80386 Programming Guide 


MOVS — move string. Move data from the memory location addressed via ESI to 
the one addressed via EDI. 

OUTS — output string. Send data from the memory location addressed via ESI to 
an Dutput port (addressed via DX). 

SCAS — scan string. Subtract the contents of the memory location addressed via 
EDI from the accumulator and set the flags accordingly. Neither operand changes. 

STOS — store string. Store the contents of the accumulator at the memory location 
addressed via EDI. 

Note that CMPS and MOVS update both ESI and EDI. String primitives have byte, 
word, and double word versions. The versions all update the pointers by the step re- 
quired to reach the next element (1 for byte versions, 2 for word versions, and 4 for 
double word versions). 


Examples 


1. Load AL from the address in ESI, then increase ESI by 1: 
CLD ‘SELECT AUTOINCREMENTING 
LODSB *GET DATA AND UPDATE POINTER 
2. Move a double word from the address in ESI to the one in EDI, then decrease both 
ESI and EDI by 4: 


STD >SELECT AUTODECREMENTING 
MOVSD SMOVE A DOUBLE WORD AND UPDATE 
; BOTH POINTERS 


3. Compare the character in AL to the one at the address in EDI. Increase EDI by 1 
and jump to DEST if the characters are the same: 


CLD SSELECT AUTOINCREMENTING 

SCASB ;COMPARE CHARACTERS AND UPDATE 
; POINTER 

JE DEST sJUMP IF THE SAME 


The REP prefix repeats a string instruction while counting down register ECX. As 
noted in Chapter 2, the sequence of events is: 


1. Check if ECX contains zero and exit if it does. 
2. Do the string instruction. 
3. Subtract 1 from ECX and retum to step 1. 
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Examples 


1. Move a block of data from addresses starting at STR1 to ones starting at STR2. The 
length of the block is in locations LEN through LEN +3: 
MOV ESI,STRI sBASE ADDRESS OF SOURCE 
MOV EDI,STR2 sBASE ADDRESS OF DESTINATION 
MOV ECX,[LEN] ;BLOCK LENGTH 
CLD SSELECT AUTOINCREMENTING 
REP MOVSB sMOVE DATA BLOCK 
MOVSB with the REP prefix forms an implicit loop. It updates the pointers and 
counters and jumps back automatically. Note that you should consider REP and the 
subsequent string instruction as a unit for debugging and documentation purposes. 
2. Find the next delimiter character (DELIM) in a string starting at the address in EDI. 
The length of the string is in register ECX. Jump to address DONE if the string has 
no delimiter: 


CLD SSELECT AUTOINCREMENTING 
MOV AL,DELIM ;GET DELIMITER CHARACTER 

REPNE SCASB sLOOK FOR DELIMITER CHARACTER 
JNE DONE sSJUMP IF NO DELIMITER 


REP prefixes and string primitives are very useful in parsing command lines. 
REPNE repeats the string instruction as long as the Zero flag is not O and ECX has 
not been decremented to zero. Note that the processor tests the Zero flag at the end 
of each iteration, whereas it tests ECX at the beginning. 


After the implicit loop, the Zero flag indicates the reason for the exit. Z = 1 if it oc- 
curred because SCASB found a delimiter, Z = O if the exit occurred because ECX 
was decremented to 0. ECX contains the number of characters left in the unparsed 
part of the string. 


CODE CONVERSION 


You can convert data between codes using arithmetic or logical operations for simple 
cases or lookup tables for complex cases. 
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Examples 


1. Convert an ASCII digit in AL to its binary-coded-decimal (BCD) equivalent: 
SUB AL,’0' ;*CONVERT ASCII TO BCD 
or 
AND AL,11001111B ;CONVERT ASCII TO BCD 
The ASCII digits form an ordered sequence (30 to 39 hex). To jump to ERROR if 
AL is out of range, use 


SUB AL,’0’ ‘CONVERT ASCII TO BCD 

JC ERROR ‘-ERROR IF BELOW ASCII ZERO 

CMP AL,9 

JA ERROR -ERROR IF ABOVE ASCII NINE 
2. Convert a decimal digit in AL to its ASCII equivalent: 

ADD AL,’0O’ ‘CONVERT BCD TO ASCII 


or 
OR  AL,00110000B ;CONVERT BCD TO ASCII 
To jump to ERROR if AL is out of range, use 


CMP AL,9 
JA ERROR sERROR IF ABOVE NINE 
ADD AL,’0' ;CONVERT BCD TO ASCII 


3. Convert one 8-bit code to another using alookup table. Assume that the lookup table 
Starts at address NEWCD and the original code is in address CODE: 
MOV AL,[CODE]  ;GET OLD CODE 
MOV EBX,NEWCD ;GET BASE ADDRESS OF TABLE 
XLAT ;CONVERT TO NEW CODE 
This approach could convert ASCII to EBCDIC. 





MULTIPLE-PRECISION ARITHMETIC 


Multiple-precision arithmetic requires a series of operations. For example, multiple- 
precision addition or subtraction involves the following steps: 


1. Clear Carry initially, as there is never a carry into or borrow from the low byte. 

2. Add or subtract corresponding units (bytes, words, or double words) using the Add 
with Carry or Subtract with Borrow instruction. 

3. Repeat step 2 until the entire numbers are added or subtracted. 
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For example, the following program does 128-bit addition as four 32-bit operations: 
MOV ECX,4 -NUMBER OF DOUBLE WORDS = 4 
MOV EDI,NUMI ;POINT TO LOW BYTES OF NUMBERS 
MOV — ESI,NUM2 


CLC ;CLEAR CARRY INITIALLY 

CLD SSELECT AUTOINCREMENTING 
ADD3?: LODSD SGET 32 BITS FROM OPERAND 2 

ADC EAX,[ECI] ;ADD 32 BITS FROM OPERAND 1 

STOSD SSTORE RESULT OVER OPERAND 1 

LOOP ADD32 ;COUNT DOUBLE WORDS 


Decimal operations must proceed a byte at a time, as packed and unpacked decimal 
instructions (DAA, DAS, AAA, AAD, AAM, and AAS) operate only on 8-bit results. 
For example, the following program adds 20-digit packed decimal numbers (2 digits 
per byte): 

MOV ECX,10 -NUMBER OF BYTES = 10 
MOV EDI,NUMI1. ;POINT TO LOW BYTES OF NUMBERS 
MOV ESI,NUM2 


CLC ;CLEAR CARRY INITIALLY 

CLD >SELECT AUTOINCREMENTING 
ADDBYT: LODSB ;GET TWO DIGITS FROM OPERAND 2 

ADC ALJEDI] ;ADD TWO DIGITS FROM OPERAND 1 

DAA sMAKE SUM DECIMAL 

STOSB >> TORE SUM OVER OPERAND 1 


LOOP ADDBYT  ;COUNT BYTES 
DAA converts a binary sum in AL to a decimal sum. 


~ DATA STRUCTURE MANIPULATION 


To handle general data structures (lists, queues, stacks, etc.) on the 80386, we use the 
procedures described earlier for array manipulation, table lookup, and string process- 
ing. The major limitation is the lack of multilevel indirect addressing. However, the 
LEA (load effective address) instruction can provide this kind of addressing in a series 
of steps. For more details on data structures, see the books by A. Tenenbaum and M. 
Augenstein such as Data Structures Using Pascal, 2nd ed. (Englewood Cliffs, NJ: 
Prentice-Hall, 1986). 
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Examples 


if 


Get the next element in a linked list. The address of the current element is in register 
EBX. Each element has a link to its successor in its first 4 bytes. Put the new ad- 
dress in ESI: 

MOV ESI,[EBX] sREPLACE POINTER WITH LINK 
You can use a similar procedure to remove an element from a linked list. The only 
addition is that you must link its predecessor to its successor (thus unlinking it). The 
sequence 1S 


MOV ESI[EBX] <GET ELEMENT TO BE REMOVED 
MOV EAX,[ESIT] -MOVE LINK FROM REMOVED 
MOV — [EBX],EAX >; ELEMENT TO ITS PREDECESSOR 
More generally, if the link is at offset LINK in each element, the sequence is 
MOV __ ESI,LINK[EBX] <GET ELEMENT TO BE REMOVED 
MOV EAX,LINK[ESI] SMOVE LINK FROM REMOVED 
MOV LINK[EBX],EAX ; ELEMENT TO ITS PREDECESSOR 


In practice, you should also test the link to be sure that a successor exists. Remem- 
ber that MOV does not affect the flags. 


. Insert an element in a linked list. Assume that the element’s address 1s 1n ESI and 


the address of the preceding elementis in EBX. Link the new element to its predeces- 
sor and to its predecessor’s successor: 


MOV EAX,[EBX] SMOVE OLD LINK TO NEW ELEMENT 
MOV _ [ESI],EAX 
MOV — [EBX],ESI sMAKE NEW ELEMENT 

> INTO NEW LINK 


This procedure lets you add a new element to a linked list. The new element 1s now 
the successor of its predecessor, and the predecessor’s former successor 1s now the 
new element’s successor. More generally, if the links are at offset LINK in each ele- 
ment, the procedure 1s 

MOV EAX,LINK[EBX] SMOVE OLD LINK TO NEW ELEMENT 

MOV ~ LINK[ESI],EAX 

MOV LINK[EBX],ESI sMAKE NEW ELEMENT 

; INTO NEW LINK 


. Add an element to a stack. Assume that the address of the next empty stack location 


is inlocations SPTR through SPTR+3 and the new element 1S in register EAX. As- 
sume also that the stack grows up inmemory (toward higher addresses). This is the 
Opposite of the hardware stack, which grows down in memory: 
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MOV EDI,[SPTR] ;SGET STACK POINTER 
CLD SSELECT AUTOINCREMENTING 
STOSD SSA VE ELEMENT IN MEMORY 
MOV — [SPTR],EDI ;UPDATE STACK POINTER 
In practice, we must check for stack overflow by comparing the stack pointer to an 
upper limit. 


4. Remove an element from a stack. Assume that the address of the next empty loca- 
tion is in locations SPTR through SPTR+3. Put the element in register EAX. As- 
sume also that the stack grows up in memory (toward higher addresses): 


MOV EDI[SPTR] ‘*GET STACK POINTER 

SUB EDI,4 ‘~POINT TO HIGHEST OCCUPIED 
* LOCATION 

MOV EAX,[EDI] *GET ELEMENT FROM STACK 

MOV — [SPTR],EDI ‘UPDATE STACK POINTER 


The lack of predecrementing is anuisance here. In practice, we must check for stack 
underflow by comparing the stack pointer to a lower limit. 


Program Examples 


1. 8-bit maximum value. Assume that register EBX contains the array’s base address 
and register EAX contains its size in bytes. The maximum value ends up in register 
AL. 


sEXAMINE ELEMENTS ONE AT A TIME, COMPARING EACH 
ONE’S VALUE WITH CURRENT MAXIMUM AND ALWAYS 
; KEEPING LARGER VALUE ANDITS ADDRESS. 

SIN THE FIRST ITERATION, TAKE THE FIRST ELEMENT 

; AS THE CURRENT MAXIMUM 


0 
> 


@ 
> 


MOV ECX,EAX SSAVE NUMBER OF ELEMENTS 


CLD SSELECT AUTOINCREMENTING 
MOV EDI,EBX aoe POINTER AS IF PROGRAM HAD 
; JUST EXAMINED THE FIRST 
; ELEMENT AND FOUND IT TO BE 
; LARGER THAN PREVIOUS 


> MAXIMUM 


1 
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INC EDI 
MAXLP: | MOV EBX,EDI “SAVE ADDRESS OF ELEMENT JUST 
; EXAMINED AS ADDRESS OF 
; MAXIMUM 
DEC EBX 
MOV AL[EBX] | ;SAVE ELEMENT JUST EXAMINED 
; ASMAXIMUM 


;>COMPARE CURRENT ELEMENT TO MAXIMUM 
s-KEEP LOOKING UNLESS CURRENT ELEMENT IS LARGER 


MAXLP1: DEC ECX ;COUNT ELEMENTS 
JZ EXITLP JUMP (EXIT) IF ALL ELEMENTS 
> EXAMINED 
SCASB ‘COMPARE CURRENT ELEMENT TO 
; MAXIMUM, ALSO MOVE POINTER 
> TO NEXT ELEMENT 
JAE MAXLPI ‘;CONTINUE UNLESS CURRENT 
; ELEMENT IS LARGER 
JB = MAXLP -ELSE CHANGE MAXIMUM. TO FIND 
; LAST OCCURRENCE RATHER 
; THAN FIRST OCCURRENCE, 
; CHANGE THE CONDITIONAL 
; JUMPS TO JA AND JBE 
EXITLP: NOP -EXIT WITH LARGEST ELEMENT IN 
; AL ANDITS ADDRESS IN EBX 
. Length of a character string. Assume that the base address of the string is in register 
EDI and the terminating character is in register AL. The length (in register EAX) 
does not include the terminator: 


MOV ECx,-—1 START STRING LENGTH AT —1i 
CLD SET AUTOINCREMENTING 
REPNE SCASB -KEEP CHECKING CHARACTERS 


> UNTIL A TERMINATOR APPEARS. 
;CONTINUOUS DECREMENTS OF 
> ECX MEAN THAT IT CONTAINS 
; -—2-LENGTH AT THE END 
MOV EAX,-2 ‘COMPUTE STRING LENGTH 


1 
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SUB EAX,ECX >; IN REGISTER EAX 
Typical tenninators are an ASCII carriage return and an ASCII NUL (Q). A quick- 
er way to compute the string length is with the sequence 
INC ECX ;COMPUTE STRING LENGTH IN ECX 
NOT ECX 
. Pattern match. Assume that the base addresses of the strings are in registers ESI and 
EDI and their length is in register ECX. Set the Zero flag to 1 if the strings match 


and to O if they do not. 
CLD SSET AUTOINCREMENTING 
REPE CMPSB ‘KEEP COMPARING CHARACTERS 


; UNTIL ALL ARE EXAMINED OR 
; CORRESPONDING CHARACTERS 
; ARE NOT THE SAME 
To set register AL to 1 if the strings match and to Oif they do not, end the program 
with 
SETE AL SMOVE ZERO FLAG TO AL 
You could also use SETE to store a boolean value in memory. 
. Convert a hexadecimal digit to ASCII. Assume that the hexadecimal digit is in AL 
originally: 


CMP AL,10 IS DIGIT 10 OR LARGER? 
JB ADDAZ 
ADD AL,/7 SYES, ADD AN EXTRA 7 
ADDAZ: ADD AL,30H ;ADD ASCII ZERO. END WITH ASCII 
>; DIGITIN AL 


The following method, credited to Dennis Allison, works with no branches at all 
(don’t ask me why!): 


ADD AL,90H ‘DEVELOP EXTRA 6 AND CARRY 
DAA 

ADC AL,40H -ADD CARRY, ASCII OFFSET 
DAA 


. Filla block of memory. Assume that the base address of the block is in register EDI, 
the value to be stored there is in register AL, and the size of the block is in register 
ECX: 
CLD SSET AUTOINCREMENTING 
REP STOSB ‘FILL THE BLOCK 
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This will run faster if you fill a word or a double word each time rather than just a 
byte. Of course, you must divide the size by 2 or 4. 


6. Add entry to list. Add the 32-bit element in EAX to a list if it is not already there. 
The list starts at address ELEMS and its length is at address COUNT: 
MOV EDI,ELEMS *<GET BASE ADDRESS OF LIST 
MOV ECX,[COUNT] s;GET LENGTH OF LIST 
CLD SSET AUTOINCREMENTING 
REPNE SCASD sLOOK FOR ENTRY IN LIST 
JE DONE sEXIT IF ENTRY ALREADY 
>; INLIST 
MOV [EDI],EAX SENTRY NOT IN LIST, SO PUT 
; IT THERE 


INC DWORD PTR[COUNT];AND ADD 1 TO LENGTH 
DONE: NOP 
At the end of the implied loop (REPNE SCASD), the Z flag is 1 if the exit occurred 
because the element was found. Z1s 0 if the exit occurred because ECX was counted 
down to zero. 


7. Remove entry from list. Remove the 32-bit element in EAX from a list if itis there. 
The list starts at address ELEMS and its length is at address COUNT: 

MOV EDI,ELEMS ;GET BASE ADDRESS OF LIST 
MOV ECX,[COUNT] ;GET LENGTH OF LIST 
CLD ;SSET AUTOINCREMENTING 

REPNE SCASD sLOOK FOR ELEMENT IN LIST 
JNE DONE sEXIT IF ELEMENT NOT IN LIST 
MOV ESI,EDI sSET POINTERS TO COMPACT 

; REST OF LIST 

SUB EDI,4 ; DEST =SOURCE- 4 

REP MOVSD ;COMPACT REST OF LIST 


DEC DWORD PTR[COUNT];SUBTRACT 1 FROM LENGTH 
DONE: NOP 
Atthe end of the implied loop (REPNE SCASD), the Z flag is 1 if the exit occurred 
because the element was found. Z is O1f the exit occurred because ECX was counted 
down to zero. 
8. Absolute value. Take the absolute value of the 32-bit numberin EAX:. 


aN 
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TEST EAX,EAX ;CHECK SIGN OF NUMBER 


JINS NOTNEG “JUMP (DO NOTHING) IF NUMBER 
> IS POSITIVE 
NEG EAX ‘NEGATE IF NUMBER IS NEGATIVE 
NOTNEG: NOP ‘CONTINUE WITH ABSOLUTE VALUE 
- IN EAX 


The following method (courtesy of a Microsoft advertisement) uses no branches but 
changes EDX as well as EAX: 
CDQ sEXTEND HIGH BIT OF NUMBER 
; INTO EDX, SO [EDX] = 
; FRFRFFFFFF IF NUMBER NEGATIVE 
; AND OIF NUMBER POSITIVE 
XOR EAX,EDX STAKE 1’S COMPLEMENT IF 
; NUMBER NEGATIVE, LEAVE IT 
; UNCHANGED IF IT IS POSITIVE 
; XOR WITH 1S INVERTS BITS 
SUB EAX,EDX ;COMPUTE 2’°S COMPLEMENT IF 
; NUMBER NEGATIVE BY 
; SUBTRACTING —-1. NO EFFECT IF 
> NUMBER POSITIVE AS [EDX} = 0 
Remember that EXCLUSIVE-ORing a bit with 1 inverts its value, whereas EX- 
CLUSIVE-ORing it with O has no effect. 


PARAMETER PASSING TECHNIQUES 


The common ways to pass parameters on the 80386 microprocessor are through 
registers or on the stack. The register approach 1s fast and simple for subroutines that 
take only one or two parameters and return only one or two results. The stack approach 
iS more general and can use the special ENTER and LEAVE instructions. 


A typical register-based approach extends Intel’s standard PL/M-86 function inter- 


face as follows: 


1. Asingle data valueis passed orreturned in AL, AX, orEAX, depending onits length. 
2. A single address (offset) is passed or returned in EBX. 


This avoids elaborate stack manipulations. 
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Stack contents on entry to the working part of a subroutine. 


Stack-based approaches generally use the EBP register. The typical procedure at the 
Start of a subroutine is: 


1. Save register EBP in the stack. 

2. Set EBP to the current value of the stack pointer (called the frame pointer). 

3. Reduce the stack pointer by an amount sufficient to allow for local storage within 
the subroutine. 


We refer to the stack area containing parameters, register values, the return address, 
and local storage as the current frame. Figure 3-8 shows a frame with parameters at the 
bottom (put in the stack by the calling program). Above them are the return address, 
the previous frame pointer, and local storage. Remember that the stack grows down 
(toward lower addresses). 

The ENTER instruction takes care of all three entry steps. It can even handle nested 
subroutines that must preserve frame pointers from lower lexical levels. The general 
form is 

ENTER IMM 16,IMM8& 
The first operand is the number of bytes in the local storage area. The second operand 
is the lexical level. 

Within a subroutine, you can then address items as follows: 
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« Parameters with positive offsets from the frame pointer. These offsets 
must account for the retum address and the previous frame pointer. 
¢ Local variables with negative offsets from the frame pointer. 
Before retuming, the subroutine must clean the stack as follows (see Figure 3-8): 


1. Set the stack pointer to the frame pointer, thus removing local variables. 
2. Pop the previous frame pointer from the stack. 
3. Increase the stack pointer to remove the parameters from the stack. 


The LEAVE instruction does the first two steps. A special RET followed by a parameter 
does the third step. Of course, this approach assumes that the subroutine preserves the 
frame pointer EBP. 


MAKING PROGRAMS RUN FASTER 


To speed up a program effectively, you must first find its most frequently executed 
loops. You must thenremove redundant or unnecessary operations from them. Remem- 
ber that the LEA instruction can help avoid repetitive address calculations. 

After this optimization, the best way to speed up 80386 assembly language programs 
is to eliminate jumps. They are particularly time consuming because they force the 
processor to clear its pipeline. Thus they cause extra delays while the pipeline is being 
refilled. Ways to eliminate jumps include: 

¢ Reorganize loops by changing the initial conditions. This often allows a 
single jump at the end of a loop (a do-until structure rather than a do- 
while). 

¢ Structure the logic to minimize the number of times a conditional jump 
is taken. If, for example, one case is infrequent (such as an overflow or 
an exact match), make it cause a jump rather than having the common 
case Cause one. 

e Use in-line code rather than subroutines. 

Other ways to reduce execution time include: 

¢ Align all memory accesses. While the 80386 does unaligned accesses, 
they take extratime. So you should keep procedure entry points, looping 
destinations, and frequently used data at double-word aligned addresses. 
You can force alignment by filling data areas, putting NOPs in the 
program, or using the ALIGN and EVEN assembler directives. 
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¢ Use the string primitives and REP whenever possible. They are faster 
and shorter than sequences with explicit increments and decrements. 

- Use registers for their intended purposes. For example, use ECX as a 
counter, EAX as an accumulator, EBX as a base register, and ESI and 
EDI as index registers. While the 80386 allows general assignments of 
registers, there is a time and memory penalty. 

- Use other specialized instructions such as XLAT, LOOP, and the BCD 
and ASCII (packed and unpacked) arithmetic instructions. 

« Use instructions that store their results in memory. They can help you 
avoid saving and restoring registers. 

A general way to reduce execution time 1s by replacing long instruction sequences 
with tables. A table lookup can replace an instruction sequence if it has no special exits 
or complex logic. Tables make sense even if many of their entries are the same. After 
all, it’s only cheap memory that you're wasting. What’s amere kilobyte among friends 
these days? It’s not like the long-past 1970s when a man’s worth was measured by how 
many bytes he could remove from his code. Tables take extra memory, but lookup 
methods are fast, general, easy to program, and easy to change. 





COMMON PROGRAMMING ERRORS 
The most common errors in 80386 assembly language programs are the following: 
¢ Reversing the order of operands, particularly in MOV and CMP instruc- 
tions. Remember that the destination comes first. The orderis backward 
in moves but normal in comparisons. That is, CMP op1,op2 computes 
op 1—op2. 

- Using the flags incorrectly. The usual problems are trying to jump after 
instructions (such as MOV or IN) that do not affect the flags and over- 
looking instructions (such as shifts or SUB reg,reg) that do affect them. 

¢ Getting the logic wrong in conditional jumps. The usual problems are in- 
verting the logic (for example, using JNZ instead of JZ) or jumping in- 
correctly when operands are equal. Note that comparing equal values 
clears the Carry. A quick hand check of each jump can avoid many 
problems. 

- Confusing addresses and data. A common typing error is to omit the 
brackets around an address. A common logical error is to ignore the ef- 
fects of indirection or to forget how it applies to jumps and calls. 
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e Mishandling arrays and strings. The usual problem is going beyond the 
boundaries. Note, forexample, that an array starting at BASE and, having 
N 8-bit elements, occupies addresses BASE through BASE+N-1. The 
BOUND instruction can help you avoid improper array references. 

¢ Organizing the program improperly. The usual problems are skipping or 
repeating initialization routines, failing to update counters or pointers, 
and failing to save results. 


SUMMARY 


Assembly language programming for the 80386 microprocessor is very similar to 
programming for its predecessors, the 8086 and 80286. Simple programs canuse MOV 
and arithmetic and logical instructions to manipulate registers. Most instructions have 
8-, 16-, and 32-bit versions. 

Decision making involves conditional jumps and the bit manipulation and com- 
parison instructions, Bit manipulation instructions move a bit value to the Carry flag. 
The CMP (compare) instruction affects the Carry, Zero, Sign, and Overflow flags. Con- 
ditional jumps are available for all signed and unsigned results. 

Multiple-precision arithmetic operations depend on the Carry flag to transfer car- 
nes or borrows between iterations. There are also special instructions for packed and 
unpacked BCD operations. These instructions work on 1 byte at atime and require data 
to be in the accumulator. 

Array, String, table, and data structure manipulation depend on the use of index and 
base registers. Indexed and based addressing are the key modes, particularly whencom- 
bined with displacements and scale factors. String primitives simplify string and array 
handling by combining simple operations with the updating of one or two pointers. The 
REP prefix forms an implicit loop that repeats a string primitive a specific number of 
times. The LOOP instruction provides a simple decrement-and-jump-if-not-zero 
capability. 

After standard optimization techniques have been used, the best way to make 80386 
programs run faster is by reducing the number of jumps. The most common errors in 
80386 programs are reversing the order of operands, using the flags incorrectly, invert- 
ing decision logic, confusing addresses and data, handling arrays and strings incorrect- 
ly, and organizing sequences improperly. 
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/ t (marriage) happens as with cages: the birds without 
despair fo get in, and those within despair to get out. 
Michel Eyquem de Montaigne, Essays 


Talk of court news; and we'll talk with them too, 
Who loses and who wins; who’s in, who’s out; 
And take upon’s the mystery of things... . 

William Shakespeare, King Lear 


This chapter describes 80386 input/output. It first explains alternative I/O methods. It 
then discusses addressing, I/O instructions, and I/O chips. Later sections present ex- 
amples and describe interrupts and direct memory access (DMA). 


ALTERNATIVE I/O METHODS 
The major approaches to I/O with any processor are the following: 


1. Programmed I/O. Here everything occurs under program control. Software must 
determine whether devices are ready and select the order in which to handle them. 


add 
andl 
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We refer to checking a device’s status as polling. This method 1s best suited to low- 
speed peripherals such as switches and small displays. 

2. Interrupt-driven I/O. Here a special signal that goes directly into the CPU (rather 
than to an I/O port) indicates that a device is requesting a transfer. The processor 
responds by suspending its current activities and servicing the request. This method 
is best suited to medium-speed peripherals such as i terminals, com- 
munications lines, plotters, and printers. 

3, DMA (direct memory access) transfers. Here data moves iiveete between memory 
and I/O without processor intervention. The CPU must usually load external 
registers and counters with a base address and the block size. The processor then 
simply waits in a suspended state while the actual transfer occurs. This method is 
best suited to high-speed peripherals such as disks and signal processing or image 
processing boards. 


The 80386 allows all these methods. It also provides aclocked approach called block 
input/output that falls between interrupt-driven I/O and DMA in performance. This 
method transfers a fixed amount of data between memory and a pcripheral that runs at 
or close to processor speed. 


I/O ADDRESSING 


The 80386 allows two ways of addressing I/O ports: 
¢ Separate I/O addresses (called isolated input/output) 
¢ Allocation of memory addresses to input/output (called memory-mapped 
input/output) 
Figures 4-1 and 4-2 show how these approaches assign space to I/O and memory ad- 
dresses. 

Isolated I/O is by far the more common approach. The 80386 allows 64K of separate 
I/O addresses. The actual ports may be any combination of 8-, 16-, and 32-bit units. 
_ I/O instructions use only the 16 least significant address bits. There is no distinction 
between logical and physical addresses, and neither segmentation nor paging applies. 

Isolated I/O clearly separates the I/O and memory sections. One reason for doing 
this is that the sections have different basic units. Memory comes in units of thousands 
or millions of bytcs. Input/output, on the other hand, comes in the form of individual 
ports or packages containing a few ports. Handling both sections with one addressing 
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MEMORY ADDRESS 
SPACE 


VO ADDRESS 
SPACE 


HEGIS TERS FOR 
VO DEVICE 1 


REGISTERS FOR 


/O DEVICE 2 


REGISTERS FOR 
VO, DEVICES 





Figure 4-1 
Isolated I/O with separate memory and I/O address spaces. 
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Figure 4-2 
Memory-mapped I/O with a single address space. 


system is like assigning space on a city block to both huge skyscrapers and single-fami- 
ly homes. 

Isolated I/O also makes it easy to find I/O instructions in programs for debugging 
or analysis. Besides, it emphasizes the fact that I/O and memory behave differently. 
Peripherals are generally unidirectional, usually operate much more slowly than the 
CPU, and may not retain their contents like a memory. For example, no matter how 
one addresses a keyboard or printer, neither behaves like amemory. The keyboard does 
not respond to data sent to it, nor does the printer have much of interest to say. Neither 
peripheral can operate nearly as fast as the processor. Furthermore, keyboard data chan- 
ges as the operator presses keys, and output may not be readable even while the printer 
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is working on it. Thus isolated I/O is often the more natural approach, particularly for 
low- and medium-speed peripherals. 

Memory-mapped I/O, on the other hand, also has advantages. It lets the program- 
mer apply the entire instruction set to I/O devices. It also makes it easy to direct I/O to 
or from memory buffers, where special controllers can then handle it at their own rate. 
The single address space is easier to manage and requires fewer control signals. 
Memory-mapped I/O is particularly well-suited to situations, involving complex I/O 
devices with their own local memory. As trends clearly favor removing major respon- 
sibility for I/O from the main processor, the use of memory-mapped I/O will probab- 
ly increase in the future. 

Note that memory-mapped I/O reduces a computer’s memory capacity. The system 
must dedicate some address space to I/O, as shown in Figure 4-2. Because of hardware 
constraints and fixed assignments, this space is in the middle in most PCs. The result 
is a discontinuous address space and a great deal of confusion. In practice, the I/O ad- 
dress space is often made quite large to simplify decoding. However, wasting thousands 
of addresses is clearly far less serious in the 4-Gb space of the 80386 than it was in the 
more limited spaces of earlier processors. 

Instructions can specify isolated I/O addresses in two ways on the 80386: 

- As immediate 8-bit constants. This method applies only to ports 00 
through FF hex. However, Intel has reserved ports F8 through FF for use 
with numeric coprocessors (see Chapter 8). 

- Viaregister DX (not EDX). This method provides access to any I/O port 
but at the cost of reserving register DX. The dedication of DX conflicts 
with its other uses, such as in extension, division, and multiplication. No 
alternative to DX is allowed. 





1/0 INSTRUCTIONS 


The 80386’s I/Q instructions are: 
¢ IN accumulator,port address and OUT port address, accumulator moves 
data between an accumulator and the specified absolute port address. The 
accumulator may be AL, AX, or EAX. The port address must be in the 
range OO through FF (actually 00 through F7 as Intel reserves F8 through 

FF for the coprocessor). 
¢ IN accumulator,DX and OUT DX,accumulator moves data between an 
accumulator and the port addressed via register DX. The accumulator 
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may be AL, AX, or EAX. The port address may be anywhere inthe range 
OOOO through FFFF hex. Only register DX may be used in this way. 

¢ INS (input string) moves data from the input port addressed via register 
DX tothe memory location addressed via register EDI. Itthen either adds 
astep (1, 2, or 4) to EDI or subtracts a step from it, depending on whether 
the D flag is O or 1. 

¢ OUTS (output string) moves data from the memory location addressed 
viaregister ESI to the output port addressed via register DX. It then either 
adds a step (1, 2, or 4) to ESI or subtracts a step from it, depending on 
whether the D flag is 0 or 1. 

Both INS and OUTS have byte, word, and double word versions. As with other 
String instructions, the step size depends on the amount of data transferred. Note that 
INS uses register EDI as its pointer, whereas OUTS uses ESI. 

In INS and OUTS, the data never passes through a user register. The result is like a 
low-speed DMA channel operating under program control. In particular, note that the 
accumulator is not involved. 

We can repeat either INS or OUTS with REP, producing block input/output. 
However, the peripheral must be able to transfer data at close to processor speed. The 


tight loops 
REP INS 
or 
REP OUTS 


do not allow a software delay between operations. The peripheral must be able to 
provide or accept new data each time it is addressed. The hardware can impose a short 
delay. In practice, this approach makes sense only for special controllers. Of course, 
INS and OUTS may still be useful steps in longer loops without REP. 


PROGRAMMABLE I/O CHIPS 


Most 80386 I/O sections contain programmable I/O chips. These devices have many 
different operating modes. A program selects among them by storing values in control 
or command registers. The advantages of programmable I/O devices are: 
« They take less board space, use less power, and are cheaper to install than 
circuits made from less integrated parts. These are obviously important 
factors in personal computers. 
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Figure 4-3 
Block diagram of the 8250 Asynchronous Communications Ele- 
ment (ACE). 


- ‘They can handle a wide range of applications, so boards and computers 
based on them can serve many purposes. This is clearly an advantage to 
manufacturers of chips, boards, and computers. 

- Changes and corrections can be made in software rather than in hardware. 

Programmable I/O devices also have some disadvantages such as: 

- Lack of standardization. Each part has its own set of operating modes 

and ways to select them. The user must generally learn the idiosyncrasies 
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of a particular chip from experience. For most users, the chips are “black 
boxes” that do their jobs in some unknown way. 

¢ Higher cost than less integrated parts. 

¢ Limited documentation. The user must rely on manufacturers’ data 
sheets and an occasional application note. 

The following programmable I/O devices are popular in 80386-based computers: 

¢ 8250 asynchronous communications element (ACE), a serial (1-bit) 1n- 
terface. Lucky they didn’t call it the asynchronous serial signaler. We 
often call devices like the 8250 UARTSs (universal asynchronous 
receiver/transmitters). 

¢ 8255 programmable peripheral interface (PPI), a parallel (8- or 16-bit) 
interface. Don’t ask me what the name means. It’s probably computerese 
for “thingamajig.”’ 

¢ 8253 or 8254 programmable interval timer (PIT), a timing device. 

Many similar devices are available from several manufacturers. We will describe 
the 8250, 8255, and 8253 briefly only because of their widespread use in computers 
such as IBM PCs. They are also simpler and easier to understand than many newer 
devices, yet they do the same functions. 


8250 ACE 


The 8250 ACE is a popular interface between a microprocessor that handles data in 
parallel (8, 16, or 32 bits at a time) and peripherals that handle data serially (1 bit at a 
time). Serial interfaces are popular (particularly over long distances) because of their 
low cost. After all, they require only one data line. Common serial peripherals include 
terminals, modems, printers, and mice (mouses’?). Figure 4-3 is a block diagram of the 
8250 device. Its features include: 
¢ Interrupt-driven or programmed operation. 
¢ Buffering to eliminate the need for precise synchronization. The device 
thus holds the data until the processor or peripheral is ready for it. 
¢ Modem and RS-232 control signals. RS-232 is a standard serial interface 
defined by the Electronic Industries Association (EIA). 
¢« Choice of common options fornumber of bits percharacter(5 to 8), parity 
(even, odd, or none), and number of stop bits (idle periods) between 
characters (1, 1 1/2, or 2). 
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Table 4-1 
Summary of 8250 ACE registers 





Register Address 


ODLAB=O[ooLAB=oOLAB=o| 2 | 8 | 4 | 8 | 6 | 7 [ODLAB=1|1DLAB=1 


| Interrupt 


Receiver | Transmitter 
MODEM Line | MODEM Divisor 

Control Status Status Scratch Latch Latch 
Register | Register | Register | Register (LS) (MS) 


Buffer Holding 
Register Register 
Bit No. (Read Only) |(Write Only) Register (Read Only) Register 


(FBR | THR | IER | UR | LCR | MCR | SR | MSR | SCR | DLL | OLM 


Interrupt Ident. Line 
Enable Register | Control 


























0 Data Bit 0* | Data BitO | Enable OP in Data IData Ready |Detta Clear Bit 0 Bit 8 
Received | Interrupt Terminat © 
Data Pending |Select Bit 0} Ready 
Avaiable (DTR) 
Interrupt 
(ERBFI) 
1 Data Bit 1 | Data Bit tf | Enable Interrupt Word Request to; Overrun |Delta Data Bit 1 Bit 9 






Set Ready 
(DDSR) 


Transmitter 
Holding 


ID Length 
Bit (0) |Select Bit 1 
(WLS1) 





Register 
Empty 
Interrupt 
(ETBE|) 


2 Data Bit 2 | Data Bit 2); Enable Interrupt | Number of Trailing | Bit 2 Bit 10 
Receiver Stop Bits Edge Ring 
Line Status Indicator | 
Interrupt (TER}) 
(ELSI) 





3 Data Bit 3 | Data Bit3 | Enable | | Framing |Delta Data Bit 3 Bit 3 Bit 11 
MODEM |. "Btror Carrier | | 
Status | (FS) Detect 
| Interrupt (DDCD) 
(EDSS)) 
4 Data Bit 4 | Data Bit 4 | Break Ciearto | Bit 4 Bit 4 Bit 12 
Interrupt send | 
5 | Data Bit 5 | Data Bit 5 Stick Parity i | Bit 13 
Register 
(THRE) 
6 Data’Bit 6 | Data Bit 6 Set Break Transmitter i Bit 14 
Empty Indicator 
| (TEMT) (Ril) 


7 Data Bit 7 | Data Bit 7 Divisor 0 0 Data Bit 7 Bit 7 Bit 15 
Latch Carrier 
Access Bit Detect 
| _(DLAB) | (OCD) 


“Bit Ois the least significant bit. It is the first Dit serially transmitted or received. 


Holding 
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Table 4-2 
8250 ACE baud rates using 1.8432 MHz Crystal 





Divisor Used 
to Generate 
16 x Clock 


Percent Error 
Difference Between 
Desired and Actual 






Desired 
Baud Rate 










Oo 
LDP dt heheh eb a9 





Nm 
© 
o> 


Note: 1.8432MHz is the standard 8080 frequency divided by 10. 


¢ Timer (called a baud rate generator) that provides the intervals between 
bits. 

e Error-checking facilities. These include the ability to screen out short 
noise pulses and to recognize problems such as overrun, incorrect parity, 
and improper data structure (or framing). Overrun means that the device 
received a new character before the old one was read. 

Table 4-1 summarizes the 8250's registers. Data passes through register 0. The 
others act as control or status registers. The baud rate generator can produce any com- 
mon data rate (such as 110, 300, 1200, 2400, 4800, or 9600 baud) with the aid of a 
divisor as shown in Table 4-2. 
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Figure 4-4 
Block diagram of the 8255 Programmable Peripheral Interface 
(PPI). 


8255 PPI 


The 8255 PPI is a parallel interface intended for peripherals such as printers, plotters, 
and analog I/O boards that handle data 8 or 16 bits at a time. Figure 4-4 is a block 
diagram of the device. Its key features are: 
- Two 8-bit ports and two 4-bit ports that can be either input or output 
- Direct bit set/reset capability for the status and control port (port C) 
- Automatic status and control signals for either unidirectional or bidirec- 
tional transfers 
Figures 4-5 and 4-6 summarize the PPI’s options. Its operating modes are: 
¢ Mode QO, in which all ports work independently as either inputs or out- 
puts. There are separate input or output selection bits for the top and bot- 
tom halves of port C (bits O through 3 and 4 through 7). 
¢ Mode 1, in which bits of port C act as status and control signals for ports 
A and B. The status signals (STROBE or ACKNOWLEDGE) indicate 
whether the peripheral is ready to transfer data. The control signals (UN- 
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CONTROL WORD 








GROUP B | 


PORT C (LOWER) 
1 = INPUT 
0 = OUTPUT 








PORT B 
3 = INPUT 
0 = OUTPUT 


MODE SELECTION 
0= MODE 0 
1 = MODE 1 


f GROUP A \ 


PORT C (UPPER} 
1 = INPUT 
0 = OUTPUT 
















1 = INPUT 
0 = OUTPUT 













MODE SELECTION 
00 = MODE 0 

01 = MODE 1 
1X = MODE 2 


MODE SET FLAG 
1 = ACTIVE 


Figure 4-5 
Mode definition format for the 8255 Programmable Peripheral Inter- 
face (PPI). 


TERRUPT REQUEST and BUFFER FULL) request service from the 
processor and tell the peripheral whether a transfer may proceed. 

¢ Mode 2, in which most of port C acts as status and control signals for 
bidirectional transfers through port A. This mode 1s uncommon in prac- 
tice. 
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CONTROL WORD 


| D, | Dg | DOs | Dg | Dg | Dy | Dy | Dy 
BIT SET/RESET 
x x x 1= SET 
—__;—_— O= RESET 
DON'T z 


BIT SELECT 


011| 2/3]4] 5/6] 7) 


ae 
=0/0| 1) 1/0]0/1] 1/64) | 
00/0 jo}1|1/1|1 {B21 





_] BITSET/RESET FLAG 
= ACTIVE 





Figure 4-6 
Bit set/reset format for the 8255 Programmable Peripheral Interface 
(PPI). 


The 8255 PPI automatically manages the exchange of status and control signals 
(called handshaking). Handshake transfers proceed roughly as follows: 


1. After determining that the port is empty, the sender stores the data in it and activates 
a DATA READY signal. 

2. In response to the DATA READY signal, the receiver reads the data from the port 
and activates a DATA ACCEPTED signal. 


The signals thus validate the transfer, much as a human handshake signifies the accep- 
tance of an agreement. The 8255 PPI not only generates and latches (holds) signals but 
it also activates and deactivates them without further processor intervention. 


8253 and 8254 PITs 


The 8253 and 8254 timers are often used to generate real-time clocks and to measure 
time intervals between events. Figure 4-7 is a block diagram of the 8253 device. 
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Figure 4-7 
Block diagram of the 8253 Programmable Interval Timer (PIT). 


Both timers have three independent 16-bit counters and can count in either binary 
or decimal. Their operating modes (Selected with control words as shown in Figure 4- 
8) can create the following wavefomms (see Figure 4-9): 

¢ Single wide pulse 

¢ Periodic series of narrow pulses 

¢ Square wave 

e Single narrow pulse at the end of an interval 


VO EXAMPLES 


1. Jump to location SERVSW if a switch attached to bit BITNO of port IPORT 1s 
closed (0): 
MOV DX,IPORT ‘ACCESS INPUT PORT 
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Control Word Format 


D> De De “De Deo Ds Dy | Dg 


[scr ] sco | Rt | Rto | m2 | mt | Mo. | BCD | 


Definition of Control 


SC — Select Counter: 


Select Counter O . 
Select Counter 1 


Select Counter 2 





1 Read/Load least significant byte first, 
then most significant byte. 


Binary Counter 16bits 


Binary Coded Decimal (BCD) Counter 
(4 Decades) 





Figure 4-8 
Control word format for the 8253 Programmable Interval Timer 


(PIT). 
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MODE 0: interrupt on Terminal Count 
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(en = 9) ae a ene! 
A B 
A+B=m 


MODE 1: Programmable One-Shot 


TRIGGER I 
4 3 2 1 0 
OUTPUT ‘98 2 tS 
in = 4) ie 


4 3 2 4 L 
OUTPUT Peete eee ee 


MODE 2: Rate Generator 








rm 4 
& 2 2-1 Oy Ss Ff 1 O82 34 0 
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Figure 4-9 


MODE 3: Square Wave Generator 


CLOCK JSUUUUUUUUUUUU 
OUTPUT (n = 4} Z 1 r 
OUTPUT (n = 5} | ~ | 7. | 


MODE 4: Software Triggered Strobe 





OUTPUT | [ 


LOAD rn 1 n=4 


GATE Eee Se) 7 | 
OUTPUT | | 


MODE S&S: Hardware Triggered Strobe 


ecock UPL LLL 
GATE ‘a eitLs ie 


OUTPUT {n = 4) = fl r 
GATE _ 
ria et Ge ee, en”) : 
OUTPUT (n = 4) bs 


Timing diagrams for operating modes of 8253 Programmable Inter- 


val Timer (PIT). 
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IN AL,DX ;GET DATA 
BT EAX,BITNO — ;TEST SWITCH 
JINC SERVSW >JUMP IF SWITCH IS CLOSED (0) 


BT moves the switch’s value to the Carry. The 16-bit port address is in DX. Note 
that in isolated I/O, IN and OUT (and their variants INS and OUTS) are the only in- 
structions that apply to ports. 

. Clear bit position BITNO of port OPORT without affecting any other bits: 

MOV DX,OPORT = ;ACCESS OUTPUT PORT 


IN AL,DX ;GET CURRENT DATA 
BTR EAX,BITNO = ;CLEAR ASSIGNED BIT 
OUT DX,AL SSEND DATA WITH BIT CLEARED 


This routine assumes that the output port is readable. If it is not, you must save a 

copy of the data in memory. The program must update the copy as well as the ac- 

tual output data. 

. Move a data byte from port IPORT into a buffer addressed via EDI when bit BITNO 

of port CPORT becomes 0. Assume that the buffer’s length is in register ECX: 
MOV DX,CPORT  ;ACCESS CONTROL PORT 


WTRDY: IN. AL,DX SGET STATUS 
BT EAX,BITNO = ;TEST STATUS 
JC WTRDY SLOOP UNTIL STATUS ACTIVE (0) 
CLD SELECT AUTOINCREMENTING 
MOV DX,IPORT ;ACCESS INPUT PORT 
INSB SMOVE DATA TO BUFFER 
INC ECX ;ADD 1 TO BUFFER LENGTH 


Here CPORT contains a status input that acts just like a switch. The routine could 

also manage a control output much as it would handle a single light. 

. Move a data byte from a buffer addressed via ESI to port OPORT when bit BITNO 

of port CPORT becomes O. Assume that the buffer’s length is in register ECX: 
MOV DX,CPORT = ;ACCESS CONTROL PORT 


WTRDY: IN AL,DX SGET STATUS 
BT EAX,BITNO = ;TEST STATUS 
JC WTRDY SLOOP UNTIL STATUS ACTIVE (OQ) 
CLD SSELECT AUTOINCREMENTING 
MOV DxX,OPORT  ;ACCESS OUTPUT PORT 
OUTSB SSEND DATA FROM BUFFER 
INC ECX ;ADD 1 TO BUFFER LENGTH 


The contents of the control port are still in AL at the end of the routine. 
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Table 4-3 
80386 interrupts and exceptions in numerical order 





identifier Description 


Divide error 

Debug exceptions 

Nonmaskable interrupt 

Breakpoint (one-byte INT 3 instruction) 
Overflow (INTO instruction) 

Bounds check (BOUND instruction) 
Invalid opcode 

Coprocessor not available 

Double fault 

(reserved) 

Invalid TSS 

Segment not present 

Stack exception 

General protection 

Page fault 

(reserved) 

Coprocessor error 

17-31 (reserved) 

32-255 Available for external interrupts via INTR pin 
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INTERRUPTS 


More advanced I/O methods require a discussion of interrupts. /nterrupts are extemal 
signals that cause the 80386 to suspend its normal activities and perform special 
routines. Exceptions are intemal conditions or instructions that have the same effect. 
We will discuss interrupts here and exceptions in Chapter 7. 

The 80386 has two interrupt inputs: nonmaskable (NMI) and maskable (INTR). 
Nonmaskable means that the interrupt cannot be shut out (the technical terms are dis- 
armed, masked, or disabled). The maskable interrupt is controlled by IF (the interrupt 
enable flag). IF is bit 9 of the flags register (see Figure 2-3). Setting IF enables mask- 
able interrupts, clearing it disables them. 

Each interrupt has an identifier or interrupt type (see Table 4-3). So do exceptions, 
which include conditions such as divide errors, invalid operation codes, and other 
problems that we have not yet described. Note that Intel reserves interrupt types 0 
through 31 for its own purposes, although it has not yet specified the meanings of all 
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Figure 4-10 
Saving the status of the 80386 microprocessor in the stack. 


of them. In the past, however, Microsoft and IBM have not respected Intel’s claims 
and have used reserved interrupts for MS-DOS functions. 
The 80386 responds to an interrupt or exception as follows: 


1. Itsaves the current flags, instruction pointer, and code segment register in the stack. 
Figure 4-10 shows the order, assuming 32-bit registers (CS 1s extended with zeros). 
The 16-bit case is similar, except that CS is not extended. 

2. It sets the IF so that other maskable interrupts will not be recognized until they are 
specifically allowed. 
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Figure 4-11 
Memory map for 80386 interrupt vectors in an 8086 compatible 
mode. 


3. It gets new values for the code segment register and instruction pointer from 
memory. The locations used depend on the interrupt type and on the memory 
management approach (see Chapter 5). In the 8086-like modes, the locations are in 
a table called the interrupt descriptor table. Its base address is in the interrupt 
descriptor table (IDT) register. The offsets for interrupt type N are4x N and4xN 
+ 1 for the instruction pointer and 4 x N + 2 and 4 x N + 3 for the code segment 
register. Figure 4-11 shows the arrangement, starting at the base address. 


The new factor here is the IDTR. It contains both a baSe address and a limit as shown 
in Figure 4-12. RESET makes the base address O and the limit 3FFH for compatibility 


139 


80386 Programming Guide 





Figure 4-12 
Data format for the interrupt descriptor table register IDTR). 


with the 8086. That is, the default is for the interrupt descriptor table to occupy the bot- 
tom 1K of memory. You can use the LIDT instruction to move the table. It loads a 6- 
byte operand (limit and base) into the IDTR register. Normally, of course, only an 
operating system would use LIDT. 

Interrupt and exception service routines often use the following special instructions: 

CLI — clear interrupt flag (disable interrupts) 

IRET — return from interrupt; restore the flags, instruction pointer, and 
code segment register from the stack, thus undoing the interrupt 
response 

STI — Set interrupt ilag (enable interrupts) 

Since interrupts and exceptions can occur at any time, their service routines must 
run transparently. That is, they must not directly change any registers or flags that the 
main program may be using. Yet they must still do their job and indicate that it has 
been done properly. Their situation is like that of a janitor who must clean an office 
without moving anything or disturbing any work in progress. Typical instructions are, 
“Don't touch any of those stacks of printouts. And don't wake up the programmer who 
is Sleeping in the middle of them. But be sure to vacuum properly (if you can see any 
carpet) and empty the wastebaskets (if you can find them).” 

Service routines generally satisfy these requirements by saving registers in the stack 
initially. They must also use the stack or memory locations for temporary storage and 
results. 


SAMPLE INTERRUPT SERVICE ROUTINES 


1. Read a character from an interrupt-driven keyboard at port KBD. Put its value in 
memory location KCHAR and set memory location KFLAG to 1: 
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PUSHEAX SAVE ACCUMULATOR 
PUSHEDX SAVE REGISTER EDX 
MOV DX,KBD SACCESS KEYBOARD PORT 
IN AL,DX <GET KEYBOARD INPUT 
MOV {KCHAR],AL SAVE KEYBOARD INPUT 
MOV BYTE PTR [KFLAG],1 ;SET KEYBOARD FLAG 
POP EDX SRESTORE REGISTER EDX 
POP EAX -RESTORE ACCUMULATOR 
IRET 


Note the following features of this routine: 


d. 


It saves and restores the registers that it uses. Remember that the interrupt 
response saves the flags automatically. Obviously, most I/O service routines must 
save EAX and EDX, since the IN and OUT instructions use them. 

PUSHAD and POPAD can save and restore the entire register set. They may be 
useful even though they take extra time and stack space. Using them makes it un- 
necessary to determine which registers the routine uses. Nor do you have to 
change the saving and restoring sections if you modify a routine. 


. Information that the main program needs must be saved in memory. This includes 


pointers, data, and flags that indicate whether the data is valid. You may compare 
the memory locations (often called mailboxes) to the drops used in spy novels. 
Obviously, other programs must not use these locations. After all, you don’t want 
your valuable information to end up in someone's junk mail or to be covered by 
the garbage from the local fish market. 


. The routine must restore registers in the opposite of the order in which it saved 


them. POPAD handles ordering automatically. 


. IRET ends the service routine and transfers control back where it was. It restores 


the flags as well as the instruction pointer and code segment register. Note that 
IRET reenables interrupts if they were enabled originally, since it restores the 
original flag values. 


. This unbuffered routine is like a peripheral with no local memory. If a second in- 


terrupt occurs before the first one is serviced, the data is simply lost. We could 
provide an overrun indicator by testing KFLAG before setting it. 


. The main program checks for a character by testing KFLAG. The only difference 


between this and polling is the activation by an interrupt. 


. Write a character to an an interrupt-driven printer at port PRINTER. Get the charac- 
ter from the buffer addressed via a pointer at address PRBUFR. Increase the pointer 
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by 1 and decrease the count in location PRCNT by 1. If PRCNT is originally 0, put 
1 in location PRFLG to indicate that the printer is ready but has not been serviced: 


PUSHEAX SAVE ACCUMULATOR 
PUSH EDX “SAVE REGISTER EDX 
PUSHESI “SAVE REGISTER ESI 
MOV AL,[PRCNT] *GET BUFFER LENGTH 
TEST AL,AL ‘TEST BUFFER LENGTH 
JZ MARKPR -EXIT IF NOTHING IN BUFFER 
CLD ‘SELECT 
; AUTOINCREMENTING 
MOV ESI,[PRBUFR] ‘-GET BUFFER POINTER 
MOV DX,PRINTER sACCESS PRINTER PORT 
OUTSB “SEND DATA, 
* UPDATE POINTER 
MOV [PRBUFR],ESI “SAVE NEW BUFFER POINTER 
DEC BYTEPTR[PRCNT] ;SUBTRACT 1 FROM BUFFER 
- LENGTH 


MOV BYTE PTR [PRFLG],0 ;INDICATE PRINTER SERVICED 
; INCASE READY EARLIER 
JMP ENDSR 
MARKPR: MOV BYTEPTR [PRFLG],1 ;INDICATE PRINTER READY 
ENDSR: POP ESI sRESTORE REGISTERS 
POP EDX 
POP EAX 
IRET 

Note the following features of this routine: 

a. Output service routines must handle the case in which there is no data to send. 
This problem does not occur with input devices, since they always have data when 
they request service. One solution is to set a flag indicating that ann interrupt has 
occurred but has not been serviced. The main program may then later send the 
data without waiting for an interrupt. In practice, of course, the service routine 
must also either clear or disable the interrupt. 

b. A buffered service routine acts much like a peripheral with its own local memory. 
The processor can put a large amount of data in the buffer, and the peripheral can 
then accept it at its own pace. 

c. A common approach in practice is for the buffer to have two pointers. One, the 
tail pointer, contains the address of the next empty location. The main program 
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Figure 4-13 

Block diagram of the 8259 Programmable Interrupt Controller 
(PIC). Reprinted courtesy of Intel Corporation, Santa Clara, 
Califomia. 


uses this pointer to put characters in the buffer. The other, the head pointer, contains 
the address of the oldest filled location. The service routine uses this pointer to send 
characters from the buffer. The part of the buffer filled extends from the head pointer 
to just before the tail pointer. This part can be anywhere physically. It can even ex- 
tend from the end of the buffer back past the beginning (like a television picture with 
the vertical hold off kilter). This feature (called wraparound) applies if the routines 
Set the pointers back to the base address when they reach the end of the buffer. 


INTERRUPT CONTROLLER 


In practice, interrupt systems require external hardware. This hardware must: 
« Generate a single interrupt signal from many inputs 
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Figure 4-14 

Initialization sequence for the 8259 Programmable Interrupt Con- 
troller (PIC). Reprinted courtesy of Intel Corporation, Santa Clara, 
California. 


¢ Select the input that has the highest priority for recognition 
- Provide the interrupt type (often called vector) when the CPU asks for it 
The device generally used for this purpose in 80386-based computers is the 8259 
programmable interrupt controller or PIC. Figure 4-13 is a block diagram of the 8259 
PIC. Its main features are the following: 
¢ An 8-level priority controller (expandable to 64 levels by adding more 
devices). 
¢ Individual request masks. They allow you to block any input at any time. 
- Variety of programmable operating modes including fully nested, 
automatic rotation, and specific rotation. Rotating prionties help avoid 
the situation in which alow-priority interrupt 1s ignored virtually forever 
because of a series of high-priority interrupts. The predicament is like 
that of a telephone caller whois left on hold indefinitely, listening to some 
awful recorded music. 


aN 


INout/Output 


iCwi 
7 > BF A Y% 2B, 


5 ®F & 


1 ICwes NEECED 
0+ WO ICWe NEECEO 














1 2 SINGLE 
0 = CASCADE MODE 





CALL ACORESS INTERVAL 
1° INTERVAL OF 4 
O- INTERVAL OF 6 





1 = LEVEL TRIGGEREO MOOE 
Q # EOGE TRIGGERED MODE 


Ay- Ae of INTERRUPT 
‘ ~ VECTOR ADDRESS 
(MCS 80. 65 MOOE ONLY) 


iCw? 





A,g-A_ OF INTERRUPT 
VECTOR AOORESS 
(MCS6O 85 MODE) 
T,-T, OF INTERRUPT 
VECTOR ACORESS 
(8086 8088 MOE) 















IR INPUT WAS A SLAVE 
IA INPUT DOES NOT HAVE | 
A SLAVE 









Si ave 10)! 








1 AUTOEGI 
0 NORMA) FO! 


— 1) = 8066, 6066 MOCE 
O = MCS-80: 65 MOOE 


Se NON @UFFEHEO MOOE 
BUFFER EO MODE/SLAVE 
| QUFFEREO MOOE ‘MASTER 


) = SPECIAL FULLY NESTED 
MODE 


<|=)o 





0 © NOT SPECIAL FULLY 
NESTED MODE © 





NOTE 1; SLAVE IC iS EQUAL TO THE CORRESPONDING 
MASTER IR INPUT. 


Figure 4-15 

Initialization command word (ICW) format for the 8259 Program- 
mable Interrupt Controller (PIC). Reprinted courtesy of Intel 
Corporation, Santa Clara, Califorma. 
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Figure 4-16 

Operation command word (OCW) format for the 8259 Program- 
mable Interrupt Controller (PIC). Reprinted courtesy of Intel 
Corpration, Santa Clara, California. 
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Figure 4-17 


Block diagram of the 8237 Programmable DMA Controller 
(DMAC). Reprinted courtesy of Intel Corporation, Santa Clara, 
Califomia. 





WRITE BUFFER 









megs 5 VO BUFFER } 


OAEQ3 st PRIORITY 


de ENCODER 








COMMAND (8) 


ANG 
- AOTATING 
Aa = 
Ht 4 PRIORITY 


DACKO- __c LOGIC 


OACK3 


MASK (45 





READ WRITE 


MODE 
(an 8) 





TEMPORARY (0) 








REQUEST (4) 
= iW 





Figure 4-14 is a flowchart of an initialization sequence for an 8259 PIC. Figure 4- 
15 contains the initialization command word format, and Figure 4-16 contains the 
operation command word format. In most cases, of course, only an operating system 
or overall I/O startup routine ever initializes an 8259 PIC. 

A key feature of an 8259 PIC is that it requires a nonspecific end-of-interrupt (EOI) 
command sent to its lower addressed port (AO = QO). This command clears the In Ser- 
vice bit, allowing further interrupts. Thus service routines for 8259-based interrupts 
must end with the sequence 


MOV DX,PICO ACCESS PIC PORT 
MOV AL,EOI ;CLEAR 8259 INTERRUPT 
OUT DX,AL 


The EOI command is 20 hex, and PICO is the port address. This sequence should come 
just before the restoring of the registers at the end of the service routine. Sending EOI 
commands is the only contact most applications programmers have with an 8259 PIC. 
Beware, however— Service routines for most computers will not work without an EOI 
command at the end. 
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Figure 4-18 


$8237 DMAC Command, Mode, and Request registers. Reprinted 
courtesy of Intel Corporation, Santa Clara, Califomia. 
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Figure 4-19 
Word count and address register command codes for the 8237 

programmable DMA controller. Reprinted courtesy of Intel Cor- 
poration, Santa Clara, Califomia. 





Direct memory access is the fastest I/O method, but it requires the most extemal 
hardware. An extemal controller must: 





¢ Manage memory addressing. 


¢ Keep track of the number of transfers performed. 




















DIRECT MEMORY ACCESS _ 











- Control the CPU while the transfers occur. This involves forcing the CPU 


to enter a Suspended state and stay in it until the operation ends. 
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Figure 4-20 
Block diagram of the 82258 advanced direct memory access 
coprocessor (ADMA). 


Many older computers use the 8237 programmable DMA controller (DMAC). This 
device (see Figure 4-17 for a block diagram) has the following features: 
¢ Four independent DMA channels 
¢ Independent control of DMA requests and channel initialization 
¢ Single transfer, block transfer, or demand transfer operating modes 

Figure 4-18 shows the 8237 DMAC’s command, mode, and request registers. Figure 
4-19 lists its command codes. 

Newer systems with larger address spaces use the 82258 advanced direct memory 
access coprocessor (ADMA). This device also has four independently programmable 
channels. However, it also provides for chaining of commands and data, thus permit- 
ting independent processing and allowing data blocks to be scattered anywhere in 
memory. Figure 4-20 is a block diagram of the 82258 ADMA. This device 1s virtual- 
ly a CPU on its own rather than just a simple controller. 
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Most 80386 I/O sections use programmable I/O chips to provide flexibility and to 
Save on power and board space. Common chips include the 8250 serial interface (ACE), 
the 8255 parallel interface (PPI), and the 8253 or 8254 timer (PIT). These devices can 
handle a wide range of applications, but there are no standards for their use or program- 
ming. 

Interrupt-driven I/O depends on the 80386’s nonmaskable and maskable interrupt 
inputs. Maskable interrupts require additional identif ying information (called a vector 
Or interrupt type). In response to these inputs, the 80386 saves the flags, instruction 
pointer, and code segment register in the stack. It then obtains new instruction pointer 
and code segment register values from an interrupt descriptor table in memory. The in- 
terrupt descriptor table register contains the table’s base address and upper limit. 
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One must have a good memory to be able 
fo keep the promises one makes. 
Nietzsche, Human, All Too Human 


A very fair scholar | was too, no thought 
but a great memory. 
Beckett 


This chapter describes 80386 memory management techniques. It first explains the 
processor’s operating modes and then covers segmentation, paging, memory protec- 
tion, and initialization of memory management systems. 


80386 HIGHLIGHTS 


The 80386’s new operating mode is the virtual 8086 mode. It lets 8086 programs run 
simultaneously with programs that use advanced 80386 features such as protection, 
paging, and very large segments. It bridges the gap between popular 8086-based (par- 
ticularly MS-DOS) software and the new features of the 80386’s protected mode. 
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The 80386’s major new memory management features are: 

- It allows segments as large as 4 Gb (about 4 billion bytes). This com- 
pares with the 64-Kb limit on previous processors. The result is that al- 
most any applications program can fit ina single 80386 segment. There 
is no need to separate program and data areas or to divide areas into seg- 
ments. Segmentation can thus be practically invisible on an 80386. It 
does, however, provide a convenient platform for implementing protec- 
tion. 

¢ It provides on-chip support for paging. The 80386 can divide logical 
memory into 4-Kb pages. It will then automatically use a page directory 
and a page table to look up the physical mapping of an address and deter- 
mine whether it is currently in memory or on disk. The 80386 also has 
an on-chip cache that holds the most recently accessed pages for the cur- 
rent task. Demand-paging thus becomes both fast and easy to implement. 

Of course, access to these features requires a 386-specific operating system such as 
XENIX V/386 (Microsoft) or System V/386 (Intel). Access to the virtual 8086 mode 
requires a virtual 8086 monitor such as VP/ix (Interactive Systems) or OS/Merge 386 
(Locus Computing). 386-specific features are not accessible under an operating sys- 
tem that must also run on 8086/8088-based computers (MS-DOS) or on 80286-based 
computers (OS/2). 


MEMORY MANAGEMENT 


The idea of memory managementis relatively new to personal computers. Large com- 
puters, on the other hand, have used it for decades. Its aim is to make efficient use of 
a memory section consisting of: 

¢- High-speed, expensive memory (called a cache) 

« Standard (main) memory 

- Backup (secondary) storage, usually magnetic disk 

Figure 5-1 shows a typical memory section with a memory management unit 

(MMU). The MMU converts addresses as the program sees them (logical addresses) 
into those needed to access actual memory locations (physical addresses). The conver- 
sion 1s almost invisible as far as programmers are concemed. You may compare the 
MMU to the electromechanical system that converts an automobile driver’s control ac- 
tions into the actual mechanical operations that run the car. Most drivers neither un- 
derstand the conversion nor care about how it works. 
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Figure 5-1 
A memory section with amemory management unit. 


The MMU also allows any reasonable division of the memory section into cache, 
main memory, and secondary storage. The tradeof fhereis obvious. Faster storage costs 
more per unit. A computer with more fast memory will execute programs faster than 
one that must use main memory and secondary storage more often. The management 
process itself takes some time besides, although the 80386 masks this by doing it in 
parallel with other operations. 

The 80386 has an on-chip MMU, as do the 80286 and Motorola 68030. On the other 
hand, the Motorola 68000, 68010, and 68020 use a separate MMU. The tradeoff here 
is also simple. An on-chip MMU 1s faster but cannot have all the features of a separate 
device. 

The rest of this chapter describes 80386 memory management. Chapter 8 discusses 
cache memory. 


OPERATING MODES 


The 80386 can operate in either of two main modes: 


1. Real mode 
2. Protected (virtual) mode 


In the real mode, used mostly for startup, the 80386 acts like a 32-bit version of the 
8086 processor. In the protected mode, the usual operating mode, the 80386 acts like 
a 32-bit version of the 80286 processor. The choice between the two modes depends 
on bit O of control register O (the protection enable or PE bit). The selection is system- 
wide, rather than on a task-by-task basis. Changing from real to protected mode is a 
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drastic action that a system usually does just once. You may compare it to moving from 
adolescence to adulthood. Anyone who wants to do that twice has been watching too 
many movies! 

Real mode has the following characteristics: 

¢ Segmentation done just as on the 8086 (see the description later in this 
chapter). 

¢ 1 Mb plus 64-Kb memory address space. The extra 64K comes from the 
carry in the segmentation calculation. The 8086 simply drops the carry, 
so its address space is exactly 1 Mb. 

¢ Effective addresses (within a segment) limited to 64K. 

¢ Operands and addresses are 16 bits by default. However, you can apply 
32-bit overrides to either or both. 

Real mode is thus not exactly like an 8086, as the 80386’s 32-bit capabilities and 
new applications-oriented instructions remain available. You would need to use either 
assembly language or a special compiler with an 80386 mode to access them. 

In protected mode, the 80386 has a submode called virtual S086 mode. It is selected 
by setting the Virtual Mode (VM) flag, bit 17 of the extended flags (Fig. 2-3), to 1. In 
V386 mode, as in real mode, the 80386 acts much like an 8086. The difference between 
the modes is that the processor can enter V86 mode on a task basis. That is, any task 
that initializes the extended flags can also choose between V86 mode and protected 
mode. Thus a system canrun, either “simultaneously” (under multitasking) or sequen- 
tially, both 8086 programs in V86 mode and generalized 80386 programs in protected 
mode. The switch to V86 mode has no drastic effects on memory management or ad- 
dress generation. You may compare it to playing juvenile games without going through 
a time warp. 

Note that there is no virtual 80286 mode, perhaps reflecting the small amount of uni- 
que 80286 software. In practice, the 80386 can run 80286 code directly with a few 
minor exceptions. However, the programmer must produce 80286 emulation by limit- 
ing address and operand size, trapping or restricting the use of 80386 features, and 
avoiding new 8386 instructions. For example, this is necessary to create programs 
that will run under OS/2 or XENIX V/286. 

OS/2 can switch back and forth between real and protected modes. This is how it 
runs MS-DOS applications. It must use real mode rather than virtual 8086 mode so that 
it can run on 80286-based computers. The problem here is that real mode provides no 
protection from ill-behaved applications. They can manipulate hardware directly, 
change interrupt vectors, or alter operating system tables or parameters. The result is 
like a secure installation that has occasional “open houses.” 
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SEGMENTATION 


Segmentation is a way to divide program and data areas into subunits called segments. 
They have the following characteristics: 
¢ Dedication to specific program functions such as code, data, or stack 
¢ Any length up to amaximum 
¢« Apply to logical memory as the programmer sees it, not to physical 
memory 
The 80386 lets a program access up to six segments at a time through the following 
registers: 
Code segment (CS), the area from which the processor is currently fetching and ex- 
ecuting instructions 
Stack segment (SS), the area containing the hardware stack used by subroutines and 
interrupts 
Data segments (DS, ES, FS, and GS), areas used for temporary data storage The 
only new 80386 features here are two more data segment registers (FS and GS). The 
accessible segments are like open files in a database management system. 


8086 Segmentation Methods 


8086 segments may be up to 64 Kb long. A 16-bit number defines a segment. Its base 
address is that number times 16. For example, segment 9000 hex starts at address 90000 
hex. Segments thus always start at addresses divisible by 16. 

The 8086 translates logical addresses into physical addresses as follows: 


1. Multiply the segment number by 16. 
2. Add the address within the segment (called the offset) to the product. 


The translation is the same on all 8086/8088-based computers. It is independent of the 
operating system or any other software. 
Let us see how this simple one-stage mapping function works in actual examples: 


1. Suppose an instruction gets data from address (offset) E137 hex and the data seg- 
ment register contains 1800 hex. The data’s physical address is 
18000 + E137 = 26137 
Note that multiplying a hex number by 16 is like multiplying a decimal number by 
10. After all, 16 is 10 hex. This is obvious if you come from a computerized planet 
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Table 5-1 
Default segment register selection rules 





Segment | 


Memory Reference Needed Register | implicit Segment Selection Rule 
Used . 

Instructions | Code (CS) Automatic with instruction prefetch 

Stack | Stack (SS) All stack pushes and pops. Any 
memory reference that uses ESP or 
EBP as a base register. 

Local Data Data (DS) All data references except when 
relative to stack or string 
destination. 

Destination Strings | Extra (ES) Destination of string instructions. 


where people have 16 fingers. A easy way to multiply by 16 1s with a 4-bit (1 hex 
digit) logical left shift. 
2. Suppose an instruction starts at address (offset) 4A5B hex and the code segment 
register contains COOO hex. The instruction’s physical address is 
COOO00 + 4A5B = C4A5B 


Each instruction has a default segment register assignment (see Table 5-1). You can 
override the default generally by prefixing an instruction and specifying a different 
segment register. For example, the instruction 

MOV EAX,[OPER] 
loads data from offset OPER in the segment defined by DS. Similar moves with seg- 
ment overrides are 

MOV EAX,ES:[OPER] 

MOV EAX,CS:[OPER} 
The first instruction gets data from the segment defined by ES, the second one from 
the code segment. 

Overrides apply even to instructions with implied memory operands as long as you 
use a form that allows for them. For example, the instructions 
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XLAT 
and 

XLAT [EBX] 
both do a table lookup in the data segment. The [EBX] serves no purpose, as XLAT 
uses EBX automatically and does not allow any other address. However, [EBX] is es- 
sential if you want to use a different segment. For example, the instruction 

XLAT CS: [EBX] 
uses a table in the code segment. Without the [EBX], there would be no place to put 
the segment override. Think of it as keeping a hitching post in front of your house, just 
in case Someone arrives on horseback. 

The only cases in which you cannot override the default segment assignment are: 

¢ The use of ES for destination addresses (accessed via DI) in the string 
primitives CMPS, INS, MOVS, SCAS, and STOS. Note that you can 
override the source segment assignment, but not the destination. As Dr. 
Seuss says about such funny things, “Don’t ask me why. Go ask your 
mother.” 

¢ The use of SS in stack instructions (PUSH, POP, CALL, RET, and so 
on). 

¢ The use of CS for instruction fetches. 

Segmentation on the 8086 is thus a simple way to extend the addressing range of 
16-bit operands. The 8086 computes 20-bit addresses (allowing access to 1M of 
memory) from two 16-bit components that it can manipulate easily. There is no time 
penalty for segmentation, as it occurs in parallel with instruction execution and uses 
separate arithmetic facilities. 

Segmentation does, however, create some awkward problems. How do you handle 
programs or data areas larger than 64 Kb? What happens when you reach the end of a 
segment? There are, in general, no easy answers to these questions. Segmentation will 
therefore never be programmers’ favorite approach to memory management. In prac- 
tice, many compilers use a single-segment approach (called a small memory model) 
that limits code and data to 64 Kb each. When code or data may be larger, programs 
may run much more slowly under the multiple-segment large memory model. The 
slowdown is the result of the time spent checking for the ends of segments and chang- 
ing the set of accessible segments. 

The simple mapping procedure makes it easy to divide memory into consecutive 
nonoverlapping segments. For example, we could divide the entire 1-Mb address space 
into 16 nonoverlapping 64-Kb segments. The first one would be segment O, the second 
segment 1000 (starting at address 10000), the third segment 2000 (starting at address 
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Figure 5-2 
How a Selector is used to compute a linear address. 


20000), etc. This is acommon way to describe the memory of anIBM PC orPC clone. 
A program that needs more than 64 Kb for code or data can readily compute new seg- 
ment register values needed to access consecutive addresses. Many 8086-based com- 
pilers use this approach to handle large programs or data areas. 

For example, suppose that a program has just accessed the data word at offset FFFE 
of segment 7000 hex. To reach the next higher address, assuming that the offset is in 
a base register, the program must: 


1. Add 2 to the base register. 
2. If the result is O, add 1000 hex to the data segment register. This requires some 
MOVs, as arithmetic instructions cannot operate on segment registers. 


The next higher address is address O of segment 8000 hex. Its physical address is 80000 
hex. Note that checking for a zero base register value increases the execution time of 
each access. 
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BASE 31.24 |GIXfo ¥ foie [PPOPL JO] TYPE BASE 23.16 | 4 
SEGMENT BASE 15..0 | SEGMENT LIMIT 15..0 | 0 
A — ACCESSED 
AVL © — AVAILABLE FOR USE BY SYSTEMS PROGRAMMERS 
DPL — DESCRIPTOR PRIVILEGE LEVEL 
G — GRANULARITY 
P — SEGMENT PRESENT 


Figure 5-3 
General format for segment descriptors. 


Protected Mode Segmentation 


In the 80386’s protected mode, segmentation works differently than on an 8086. Here, 
a segment register contains a selector. The selector then points to a@ 
ina table in memory. The ¢ Or, | 
_and other attributes. Figure 5-2 shows how a selector points t. toa herestaion 
Figure 5-3 contains the general format of segment descriptors. The fields are: 
¢ Base address (32 bits). The processor uses it to compute an intermediate 
result called a linear address, much as the 8086 uses the segment num- 
ber times 16. Thats, the processor adds the offset and base address. Note 
that no multiplication or shifting is necessary, and a segment can begin 
anywhere. 
¢ Segment limit (20 bits). This defines the segment’s length. 
¢ Type (5 bits). Figure 5-4 shows the type field for data and code (ex- 
ecutable) segments. We will discuss other types of descriptors later in 
this chapter and in Chapters 6 and 7. 
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DATA SEGMENT DESCRIPTOR 


TYPE 
TEITECEREAY ////// // 





— ACCESSED E — EXPAND-DOWN 
AVL — AVAILABLE FOR PROGRAMMER USE G — GRANULARITY 
B — BIG P — SEGMENT PRESENT 
Cc — CONFORMING R — READABLE 
D — DEFAULT Ww — WRITABLE 
OPL — DESCRIPTOR PRIVILEGE LEVEL 


Figure 5-4 
Format for data segment and executable (code) segment descrip- 


tors. 


- Descriptor privilege level (2 bits). It indicates how trusted a code seg- 


ment is or how trusted it must be to use the descriptor. We will discuss 
privilege levels later in this chapter. 

A segment present (P) bit. An operating system can use this bit to imple- 
ment virtual memory at the segment level. In practice, most systems im- 
plement virtual memory through paging instead. A major reason is that 
the variable size of segments makes swapping time consuming and dif- 
ficult to implement. 

An accessed (A) bit. The processor sets A whenever it accesses the Seg- 
ment. An operating system could use it together with the P bit to imple- 
ment virtual memory. A common approach is to test and clear the A bits 
regularly. The OS can then use the counts of how often each A bit was 
set to identify segments that have not been used recently. Those segments 
are obvious candidates for removal from physical memory. 
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¢ A granularity (G) bit. It specifies whether the limit is in units of bytes (O) 
or4 Kb (1). Itthus allows a 20-bit limit field to specify segments as large 
as 4 Gb. This would otherwise require 32 bits. 

We will describe some bits later, such as B, E, and W in the data segment descrip- 
tor and C and R in the executable segment descriptor. As mentioned in Chapter 2, code 
segments have a D (default operand/address size) bit that determines whether the 
default size for data and addresses is 16 bits (QO) or 32 bits (1). 

Other bits shown in Figure 5-4 either have specific values (O or 1) or are available 
for use by systems programmers (AVL). For example, one bit might identify segments 
containing descriptor tables, key operating system functions, and memory-mapped I/O 
devices that must remain in physical memory. 

Note, in particular, how the granularity bit works. If it is O, the segment ends at the 
limit. Its size is the limit plus 1 (including the zeroth byte at the base address). Referen- 
ces to offsets larger than the limit cause exceptions. 

If the granularity bit is 1, the segment ends at the limit shifted left 12 bits with low- 
order 1| bits inserted. For exampie, suppose G = 1 and the limit is 10000 hex. The seg- 
ment then ends at address 1OOOOFFF hex and its size is 10,001,000 bytes. To get a 
segment of size 10,000,000 hex bytes (256 Mb), you would have to specify a limit of 
FFFF. 

Thus segments can have any size up to 1 Mb. Larger segments can only have sizes 
in units of 4 Kb, such as 1 Mb + 4 Kb, 1 Mb + 8 Kb, etc. In practice, of course, seg- 
ments are usually relatively large and have sizes with large, discrete steps anyway. 
Hence, this limitation is seldom significant. If you set your heart on creating a segment 
with a length of 100,007 bytes hex, you will be cruelly disappointed. 

Segment descriptors must be placed in one of two kinds of descriptor tables (see 
Figure 5-5): 

¢ Global, that is, applying to all tasks in the system. 
- Local, thatis, applying only to the current task. Chapter 6 describes task- 
ing in detail. 

The 32-bit linear base address of the global descriptor table is in the global descrip- 
tor table register (GDTR). The 32-bit linear base address of the current local descrip- 
tor table is in the local descriptor table register (LDTR). The enlightening names here 
allow for examination questions almost as profound as Groucho Marx’s “Who is buried 
in Grant’s Tomb?” Note that the base addresses are linear, not segment relative. The 
GDTR and LDTR also contain 16-bit limits (the offsets of their last valid bytes). The 
tables consist of 8-byte entries as shown in Figures 5-4 and 5-5. 
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GLOBAL DESCRIPTOR TABLE LOCAL DESCRIPTOR TABLE 





Figure 5-5 
Global and local descriptor tables. 


A selector thus must both specify a table and provide an index to a descriptor in it. 
Figure 5-6 shows the format of a selector. Since the table entries are 8 bytes long, only 
the high 13 bits are needed for the index anyway. Of course, the index must be multi- 
plied by 8 before the processor uses it to access the table. Bit 2, the table indicator, 
determines whether the descriptor is in the GDT (QO) orin the current LDT (1). Bits O 
and 1 are the requestor’s privilege level (RPL); we will discuss it later in this chapter. 

Some example selector values are: 


1. 3005 hex. This value refers to a descriptor in the current LDT, as bit 2 = 1. The 
descriptor is in linear addresses LBASE + 3000 hex through LBASE + 3007 hex, 
where LBASE 1s the contents of the local descriptor table register. Note thatLBASE 
is a 32-bit linear address, not an offset. 

2. AFOO. This value refers to a descriptor in the GDT, as bit 2 = O. The descriptor is in 
addresses GBASE + AFO0 hex through GBASE + AFO7 hex, where GBASE is the 
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RPL — REQUESTOR’S PRIVILEGE LEVEL 


Figure 5-6 
Fortnat of a selector. 


contents of the global descriptortable register. Like LBASE inthe previous example, 
GBASE is a 32-bit linear address. 

3. O (zero). This is the famous null selector in the GDT, as bit 2 = O. It serves only to 
identify unused segment registers and other invalid segment values. Think of it as 
comparable to the null character (often designated as \Q) that ends strings in lan- 
guages such as C. 


How does address translation work in practice? Here are some examples: 


1. Suppose that an instruction gets data from address (offset) A5B370 and the data seg- 
ment register contains 150 hex. This selector refers to a descriptor in the GDT as bit 
2 = 0. The descriptor is in linear addresses GBASE + 150 hex through GBASE + 
157 hex, where GBASE is the contents of the global descriptor table register. Sup- 
pose that the base address in the descriptor is 4F1 100. The data’s physical address 
1S 

A5B370 + 4F1100 = F4C470 
Note that no multiplication or shifting is necessary. There is no simple arithmetic 
relationship between the data segment register’s contents and the segment’s base 
address. Furthermore, the translation depends on what an operating system puts in 
the global descriptor table. 

2. Suppose that an instruction starts at address (offset) 6D11F3 hex and the code seg- 
ment register contains 2004 hex. This selector refers to a descriptor in the current 
LDT as bit 2 = 1. The descriptor is in linear addresses LBASE + 2000 hex through 
LBASE + 2007 hex, where LBASE is the contents of the local descriptor table 
register. Suppose that the base address in the descriptor is 117EQO. The instruction’s 
physical address is 

6D11F3 + 117E00 = 7E8FF3 
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Figure 5-7 
Segment registers. 


Here the translation depends on which local descriptor table 1s currently 1n use. 


An important fact to remember is that the processor does not use the GDT’s first 
entry (index QO). This allows the system to load the zero (null) selector into any segment 
register without causing an exception. Note that zero refers to a selector in the global 
table, as bit 2 = 0. The zero selector is always valid for loading but never for use. Er- 
roneous references to cleared segment registers thus cause exceptions. The zeroth entry 
is valid in local descriptor tables. 

Selectors do not actually occupy entire segment registers. In fact, the registers (see 
Figure 5-7) have a visible part containing the selector and a hidden part containing the 
base address, limit, type, and other attributes obtained from the descriptor table. You 
may compare the hidden part to the attributes of a file that you select by name. The 
terms visible and hidden refer to the programmer’s point of view. Common instruc- 
tions such as MOV manipulate only the 16-bit “visible” parts. You can, however, 
retrieve the segment limit with the LSL (load segment limit) instruction and the access 
rights with the LAR (load access rights) instruction. 

In general, the hidden parts are freeloaders that “come along for the ride.” The 
processor loads one automatically when it loads the selector. Having the hidden part 
on chip saves memory accesses each time the processor uses the selector. That is, the 
processor need not read the descriptor to compute the linear address or to do validity 
checks. The information it needs 1s available in the hidden part of the segrnent register. 

In this indirect mapping, successive selectors or ones with only slightly different 
values may produce completely different linear addresses. For example, remember that 
in the 8086 segments AOOO and BOOO follow each other in physical memory (assum- 
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ing that both are 64 Kb long) just as in logical memory. In the 80386, onthe other hand, 
segments AOOO and BOOO refer to different entries in the global descriptor table. Their 
base addresses, limits, and attributes are totally independent. Thus, in the protected 
mode, there is no simple way to compute the linear address for the next higher logical 
address in anew segment. Furthermore, programs that assume simple mappings when 
handling large code or data areas or refer to physical addresses will not work in 
protected mode. This is one reason why MS-DOS programs often cannot run in 
protected mode on either the 80286 or the 80386. 


PAGING 


Unlike segmentation, paging is optional on the 80386. The PG bit (bit 31 of control 
register 0) determines whether it is activated. This bit must be 1 to activate paging. Like 
the choice between real and protected modes, the selection of paging is a systemwide 
decision. The switch to paging 1s a drastic step that a system would undertake just once 
at startup. 

The operating system can only set the PG bit in the protected mode. It must there- 
fore initialize PG after or at the same time as it sets the PE bit to enter protected mode. 
There is no paging in real mode but there can be in V86 mode. Paging in V86 mode 
allows several users to have their own separate “virtual” 8086-based machines. Note 
the obvious advantage of the more flexible mapping method here. 

On the 80386, paging occurs after segmentation as shown in Figure 5-8. Segmen- 
tation converts the logical address into the intermediate linear address. Paging then 
converts the linear address into a physical address. Both conversions involve a table 
lookup. 


Page Translation 
Page translation involves two levels of tables. They are: 


1. Page directories, which contain the physical base addresses of page tables. A page 
directory is like an index to a set of maps or local telephone directories. Page direc- 
tories allow a system to have several page tables, thus keeping users or tasks 
separated. 

2. Page tables, which contain the physical base addresses of pages. 
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Figure 5-8 
An overview of 80386 address translation in protected mode. 


A page directory is just like a page table except for the kind of entries it contains. 


To compute the physical address, the processor divides the linear address into three 
fields as shown in Figure 5-9: 
¢ Bits 22 through 31 are the page directory index. The processor uses it to 
select the base address of the page table from the page directory. The 10- 
bit field means that a directory can have up to IK (1024) page table 
entries. 
¢ Bits 12 through21 are the page table index. The processor uses it to select 
anentry from the page table. The 10-bit field means that a page table can 
have up to 1024 page entries. 
¢ Bits O through 11 are the offset on the page. The processor adds it to the 
page table entry much as itadds anoffset to asegment base address during 
segmentation. Note that the two offsets are not the same unless the 
segment’s base address 1s on a 4K page boundary. 
Paging is thus a multistage process. To implement it, the processor needs: 
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31 22 21 12 11 0 





Figure 5-9 
Format of a linear address. 


1. The base address of the current page directory in register CR3 (also called the page 
directory base register). This address can be a systemwide constant or it can change 
on a task-by-task basis. 

2. The base address of the current page table. The processor obtains it from the cur- 
rent page directory by indexing with bits 22 through 31 of the linear address. 

3. The page table entry. The processor obtains it from the current page table by index- 
ing with bits 12 through 21 of the linear address. 


As shown in Figure 5-10, the physical address is then the sum of the offset (bits O 
through 11 of the linear address) and the page table entry. This addition is the final step 
in address translation. 

Let us look at some examples: 


1. Suppose that the linear address is 1F3A1. It consists of the following fields (see 

Figure 5-9): 

- The page directory index (bits 22 through 31) is 0. 
e The page table index (bits 12 through 21) is IF. 
¢ The offset on the page (bits O through 11) is 3A1. 

Paging then proceeds as follows: 

a. The processor reads the page table’s base address (PTB ASE) from locations DIR- 
BASE through DIRBASE + 3. DIRBASE is the contents of the page directory 
base registcr (control register 3). 

b. The processor reads the page table entry from locations PTBASE + 7C hex 
through PTBASE + 7F hex. 7C is the page table index (1F) times 4, as each entry 
is 4 bytcs long. If you feel a need to check the arithmetic here, we strongly sug- 
gest using a hex calculator. Assume that the upper 20 bits of the entry (see Figure 
5-11) are PFRAME. 
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Figure 5-10 


The page translation process. 





c. The processor computes the physical address by adding 3Al1 to PFRAME ex- 
tended with 12 low-order O bits. If, for example, PFRAME =6C, the physical ad- 


dress 18 6C3A1. 


2. Suppose that the linear address is C31B79 (also a great stage name for a robot). It 


divides as follows: 


- The page directory index is 3 (the two upper bits of the most significant 
digit). Note that the break between page directory index and page table 
index occurs in the middlc of a hex digit, since the fields are both 10 bits 


long. 
¢ The pagc table index is 31 hex. 
¢ The offsct on the page is B79. 
Paging then proceeds as follows: 


a. The processor reads the page tablce’s base address (PTBASE) from locations DIR- 
BASE + C through DIRBASE + F. DIRBASE is the contents of the page direc- 
tory base register. C 1s the page directory index (3) timcs 4 in hex, as each entry 


is 4 bytes long. 


b. The processor reads the page table entry (PFRAME is the upper 20 bits) from 
locations PTBASE + C4 hex through PTBASE + C7 hex. C4 is the page table 


index (31 hex) times 4. 
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Figure 5-11 
Format of a page table entry. 


c. The processor computes the physical address by adding B79 to PFRAME. is ex- 
tended with 12 low-order O bits. 

Fortunately, only operating system developers would ever have to check these cal- 

culations in practice. 


Page Tables 


Note the following about page tables: 


l. 


They contain 32-bit elements consisting of the physical base address of a4K block 
of memory (a page frame) and other attributes (see Figure 5-11). Note that page 
tables (including page directories) contain physical addresses, not linear addresses. 
Pages always start on 4K boundaries, so the low-order 12 bits of their base addres- 
ses are all zeros. 


. Each one occupies a page. It can therefore hold up to 1K entries. Thus each table 


can provide access to 1K pages or 4 Mb of memory. Unlike descriptor tables, page 
tables are always the same size. There is no hidden limit in a page directory base 
register or in a page directory entry. Unused page table entries should be cleared to 
avoid errors. 


. A page directory also occupies a page. It can therefore hold up to | K entries. Thus 


each directory can provide access to the entire physical address space, 1K 4-Mb 
units or 4 Gb of memory. 


Figure 5-11 shows the format of a page table entry. It has (from left-to-right) the 


following fields: 
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- A page frame address or page number. This is a page's physical base ad- 
dress. 

¢ The D (dirty) bit. The processor sets D whenever it writes into a page. D 
thus indicates whether the operating system must save the page on disk 
when it is to be swapped out of memory. If the processor has never writ- 
ten on the page, the system can simply discard it. A rather disappointing 
explanation for an intriguing name! Unfortunately, D bits do not apply 
to politicians, White House staff members, Wall Street financiers, or 
football players. 

Note also that a page table entry’s D (dirty) bit is different from an executable seg- 
ment descriptor’s D (default address/operand size) bit. Only pages have dirty bits, not 
segments. However, a data segment descriptor has a bit indicating whether the segment 
is writable. 

¢ The A (accessed) bit. The processor sets A whenever it accesses a page. 
The operating system can use the A bit to identify pages that have not 
been used lately and are therefore likely candidates for swapping. 

¢- The U/S (user/supervisor) bit. This is a protection bit (to be discussed 
later) that determines whether the page is accessible by programs run- 
ning at the user privilege level or only at the supervisor level. 

¢- The R/W (read/write) bit. This bit determines whether the page is read- 
only (O) or read/write (1) at the user level. All pages are always readable 
and writable at the supervisor level. Note, however, that segment-level 
attributes apply first and can override page-level permissions. 

¢ The P (present) bit. Pis 1 if the page or page table 1s in physical memory. 
The processor signals a page exception or page fault if it tries to use an 
entry for which P = 0. 

Note that bits 9, 10, and 11 are available for the systems programmer to use. These 
bits could indicate whether a particular page can be removed from physical memory. 
Note that the current page directory and page tables must always stay in memory. So 
must memory-mapped I/O devices, basic operating system functions (the kemel), the 
global descriptor table, and basic interrupt and exception servicing functions. You may 
want to keep all interrupt handlers in memory to reduce startup time (datency). 

The system programmer may also want to indicate whether a page contains code or 
data. This is not inherent in page typing as it is with segments. 
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Page Cache 


One problem with paging is that it could take a long time. If address translation re- 
quired the processor to access both a page directory and a page table in memory, each 
instruction would take many extra cycles. In practice, to avoid this, the processor saves 
the most recently used page-table data in an on-chip cache. It then accesses memory 
only if a particular page is not in the cache. 

The page cache (called the translation lookaside buffer) contains 32 entries or- 
ganized as shown in Figure 5-12. Each entry consists of a 24-bit tag field and a 20-bit 
data field. The tag field contains the high-order 20 bits of the linear address, the valid 
bit, and the three attribute bits (present, read/write, and user/supervisor). The data field 
contains the high-order 20 bits of the physical address. Remember that the low-order 
12 bits of the physical address and the linear address are the same. 

As the translation lookaside buffer (TLB) has 32 entries, it can provide access to 32 
pages or 128 Kb of memory. That is, if a program’s main operating section uses an 
areaofmemory less than 128 Kbin size (we call this its working set), it can run without 
any cache misses after initialization. Page faults will initially cause the operating sys- 
tem to load the working set into memory. The processor will then use the set continual- 
ly without any extra memory accesses for paging. Simulations have shown that most 
programs get at least 95 percent hits with a 32-entry page cache. 

One problem with paging is the need to flush the page cache whenever the page 
tables change. Obviously, all the entries are now invalid, as the mapping has changed. 
However, the processor does not flush the cache automatically. A program must flush 
it explicitly either by reloading CR3 (the page directory base register) or by invoking 
anew task that reloads CR3. 

The page cache (TLB) introduces complications. Misses make access time variable 
because of the unpredictable effects of interrupts, exceptions, and task switches (see 
Chapters 6 and 7). Furthermore, some misses occur even for entries that are in the 
cache. The special case occurs when the processor writes into a page for the first time. 
That is, the D bit in the page’s entry is cleared. The processor must then go through the 
miss procedure just to set the D bit. Note that the procedure also sets the A bit automati- 
cally. 
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Figure 5-12 
Structure of the translation lookaside buf fer. 


MEMORY PROTECTION 


The 80386’s protection features are intended to increase software reliability and 
_ provide a sound base for multitasking and multiuser systems. They help identify errors 
that could change memory locations improperly, conflict with other uses of system 
functions or I/O, or override operating systems programs. These features are particular- 
ly important in multitasking and multiuser applications. There, one program may in- 
advertently affect other tasks or other users as well as itself. Protection features act 
much like the regulations, restrictions, and penalties that. help protect shared public 
facilities such as parks, schools, and streets. 
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There are five aspects to protection in the 80386: 

¢ Type checking. An example is determnining whether a segment that 1s 
going to be used for instructions actually contains code. 

¢ Limit checking. An example is determining whether a reference falls 
within the limit of a segment or the bounds of an array. 

¢ Restriction of addressable domain. An example is determining whether 
a segment is accessible by its caller or is protected from It. 

¢ Restriction of procedure entry points. This forces entries into an operat- 
ing system or other shared software to pass through well-defined check- 
points. 

¢ Restriction of instruction set. This means that only trusted procedures 
can execute instructions that perform memory management tasks, over- 
all system control, or input/output. 

All checks occur during address generation, so they do not slow the processor. 

A key element in the 80386’s protection mechanisms is the concept of privilege 
level. 80386 descriptors can have four privilege levels (O through 3, where 0 is the 
highest). These levels restrict access to a descriptor. Only callers with a privilege level 
at or higher than the descriptor’s privilege level (DPL) can access it. An attempt by a 
caller at a lower privilege level causes a general protection exception (see Chapter 7). 

Be careful of the fact that levels become more privileged as their numbers decrease. 
We will speak of “more privileged” segments and "higher privilege levels” without 
specifying numbers. In fact, more privileged segments are at lower-numbered levels. 
Fortunately, operating systems usually handle the details of privilege levels without 
forcing users to think in reverse. 

The approach is the same as the common practice of referring to a "first team” or 
“first string” or to “first-rate.’’ Thus, second or third stringers play behind first stringers, 
and second-rate products are inferior to first-rate ones. Also 1n many sports high num- 
bers indicate players who seldom appear or are unlikely to make the team. 

Systems need not use all four privilege levels. In fact, a system need not even use 
privilege — in this case, all its segments should be at level O. A common situation is 
for a system to have user and supervisor (operating system) levels. User segments 
should then be at level 3 and operating system segments at level 0. Note that these 
privilege levels apply to segments (logical memory), not to pages (physical memory). 

Pages, as we noted earlier, have their own privilege levels defined by the user/su- 
pervisor (U/S) bit. Here there are only two levels: O (supervisor) and 1 (user). The su- 
pervisor level, confusingly enough, corresponds to descriptor privilege levels O, 1, or 
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2. The user level corresponds to descriptor privilege level 3. This makes at least as 
much sense as the playoff systems 1n major sports leagues. The end result is that: 

- Procedures running at levels O, 1, or 2 can access all pages. 

- Procedures running at level 3 can access only user-level pages. 

Note also that the read/write bit limits the write access of procedures running at level 
3 but not of those running at levels O, 1, or 2. Be careful of the fact that typical two- 
level (user/supervisor) systems use segment privilege levels O and 3 but page privilege 
levels O and 1. . 

In most cases, page-protecting memory is redundant if you have already segment 
protected the underlying programs and data. The 80386 checks segment privilege levels 
before page privilege levels anyway. However, the page protection costs nothing and 
may give the system designer extra piece of mind. Page protection can also help trap 
references to code pages or unallocated pages. Another use for it is to protect only part 
of a large code or data segment. This is particularly important for programs that lie en- 
tirely within a single segment. 

Descriptors contain type and limit information that systems can use for protection. 
Note, for example (see Figure 5-4), that a data segment descriptor has a writable (W) 
bit that specifies whether instructions can write into it. Similarly, an executable-seg- 
ment descriptor has a readable (R) bit that specifies whether instructions can read from 
it. A page’s read/write bit can have a similar effect on user-level procedures. 

The 80386 recognizes the following as protection exceptions: 

- Violating segment or page privilege level restrictions. 

- Loading the CS register with a selector of anonexecutable segment. 

- Loading any data segment register with a selector of an unreadable ex- 
ecutable segment. 

- Loading the stack segment register with a selector of a nonwritable seg- 
ment. 

- Trying to write into anexecutable segment. In protected mode, the 80386 
thus does much more than just politely discourage self-modif ying code. 
An interpreter or debugger can, however, write into a code segment by 
defining a contiguous data segment. Thatis, you must define two separate 
segments that refer to the same addresses. 

¢ Trying to write into a data segment ora page that is not writable. 

¢ Trying to read from an executable segment that 1s not readable. 

The processor uses a segment descriptor’s limit field to check references. Only ones 
that fall within the limit are valid. A complicating factor in this determination is the E 
bitin a data segment descriptor. This bit, the expansion-direction bit, decides whether 
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Table 5-2 
Useful combinations of the E (expansion-direction), G (granularity), 
and B (big) bits in data segment descriptors _ 





Case: 1 2 3 4 


Lower bound is: 

0 X X 

LIMIT +1 

SHI(LIMIT 12,1) +1 X 
Upper bound is: 

LIMIT X 

ShI(LIMIT 12,1) X 

64K-1 

4G-1 X 
Max seg size is: 

64K X 

64K-1 

4G-4K X 

4G X 


Miri seg size is: 


< 


< 


x< 


4K | x X 
snl (X, 12, 1) = shift X left by 12 bits inserting one-bits on the right 
the segment 1s the usual (expand up) or expands down. In the expand-down case (used 
mainly for stacks), the range of valid addresses is from limit +1 to either 64K or 4 Gb, 
depending on the B (big) bit. The upper limit is 64K if the B bit is O and 4 Gbifitis 1. 
Note the following special features of an expand-down segment: 
¢ It has maximum size when the limit is zero. 
e You canexpand a Stack in Size by copying it to a larger segment without 
updating intrastack pointers. The stack then simply occupies the upper 
part of a larger area. 
¢ Table 5-2 shows useful combinations of the E, G, and B bits. 
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Fortunately, expand-down segments are rare in practice. The only exceptions are 
stacks that the operating system usually manages automatically anyway. 


Domain Restrictions 


When the processor loads a data segment’s selector into a data or stack segment register 
(DS, ES, FS, GS, or SS), it automatically evaluates access from the currently execut- 
ing segment. Figure 5-13 shows how this works. The evaluation involves comparisons 
of three privilege levels: 

¢ The CPL (current privilege level) of the executing segment 

¢ The RPL (requestor’s privilege level) of the selector 

¢« The DPL (descriptor privilege level) of the target segment’s descriptor 
An instruction may use the target segment only if its DPL is larger than or equal to both 
the CPL and the selector’s RPL. Remember that less privileged segments have higher 
privilege numbers. Thus the idea is that the target segment must be equal or lower in 
privilege than either the CPL or the RPL. 

Suppose, for example, that the CPL is 3. That is, the current privilege level is the 
lowest (user level). User programs can access only selectors and descriptors at level 3. 

A special situation is one in which a routine running at privilege level 3 tries to pass 
an operating system routine (running at privilege level O) a selector with privilege level 
Q. The operating system routine would then be able to use the selector, as it has the ap- 
propriate privilege level. The way to foil this backdoor effort is to use the ARPL (Ad- 
just Selector’s RPL Field) instruction to reduce the selector’s privilege level. ARPL 
adjusts the selector’s RPL to not less than the caller’s CPL. The descriptor (at privilege 
level QO) 1s now inaccessible because it is more privileged than the selector. 

The idea behind this kind of restriction is to prevent user programs from changing 
page directories, page tables, descriptor tables, or values internal to the operating sys- 
tem. For example, user programs could not change disk parameters or interrupt-han- 
dling routines as they can in current versions of MS-DOS. 


Restricting Control Transfers 


The introduction of privilege levels raises several new questions such as: 
How can auser routine call an operating system function (or utility) that it may need? 
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Figure 5-13 
Privilege check for access to a data segment. 


How do you write general functions such as code conversions, mathematical 
routines, and utilities that programs at all privilege levels can access? 

The answer to the first question is to use gate descriptors. There are four kinds: 

¢ Call gates 

« Trap gates 

e Interrupt gates 

¢ Task gates 
We will describe only call gates here. We will discuss task gates in Chapter 6 and trap 
and interrupt gates in Chapter 7. 

Gates reside in either the global descriptor table or alocal descriptor table. Call gates 
usually are in the global descriptor table so that all tasks have access to them. 

We can think of a call gate as a border crossing station. It allows only legal entries. 
Here 1s where the authorities look at your papers and determine whether you can pass 
to the other side. Similarly, call gates both define a procedure’s entry point and specify 
its privilege level. The hardware recognizes references to call gates and expands CALL 
instructions appropriately. 

Figure 5-14 shows the format of a call gate. It has selector and offset fields that form 
a pointer to the entry point. Thus a call gate basically provides an indirect transfer to a 
system procedure. The selector must refer to the descriptor of an executable segment. 
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31 23 15 7 O 


TYPE DWORD 
! OFFSET 31..16 efor], 5 [> 00 COUNT ¢ 


| SELECTOR | OFFSET 15..0 j 
Figure 5-14 


Format of an 80386 call gate. 





The calling instruction only has to specify the gate’s selector; the processor ignores the 
offset. A call gate also has a dword count that indicates how many double words the 
processor must move from the caller’s stack to the new privilege level’s stack. Note 
that each privilege level has its own stack to maintain system integrity. 

Call gates guarantee legitimate entry points. There are no backdoors or side entries 
into the operating system as there are in MS-DOS. The idea is to force all software to 
be well-behaved and therefore capable of running together with other programs and 
under new versions of the operating system. MS-DOS programs are often incompatible 
and incapable of running under new DOS versions because they use nonstandard entry 
points and interfere with DOS’ internal workings. Note, for example, the well-known 
inability of many memory-resident programs (such as Borland’s SideKick and 
RoseSoft’s ProKey) to work together or at the same time as other programs such as 
Microsoft Word and Lotus 1-2-3. 

Why do programs often skirt or avoid restrictions imposed by an operating system? 
Typical reasons are: 

¢ To do operations (such as screen updates and disk transfers) faster than 
the OS supports by accessing memory or I/O directly. 

¢- To control I/O operations (Such as keyboard entry) in more detail than 
the OS allows. For example, the program may want to use key combina- 
tions that the OS does not recognize or to differentiate between key 
closures and releases. 

¢ To provide immediate activation of features by intercepting OS calls. 
Seldom-used key combinations can then provide direct access to a resi- 
dent program. 
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¢ To substitute for DOS functions, thus expanding orextending commands 
that handle graphics, backup, printing, communications, and other tasks. 
¢ To introduce new features, such as multitasking, networking, encryption, 
security, or real-time control. 
- To access unsupported I/O devices, such as an optical disk, a document 
reader, or a mass storage system. 
An operating system may have many call gates. The advantage of more gates is that 


less dispatching 1s then necessary to access different functions. The disadvantage is the 
complication in programming, documentation, and debugging. 


Successful passage through a call gate requires a change of stacks if the privilege 


level changes. This involves the following steps: 


le 


oe 


Checking the size of the new stack. If it is not large enough to hold the parameters 
and linkages, a stack fault occurs (see Chapter 7). 

Pushing the old values of the SS and ESP registers onto the new stack. They provide 
the linkage back to the previous stack. The transfer of SS is a 32-bit operation in 
which the upper 16 bits are wasted. 


. Copying parameters from the old stack to the new stack. The double word count is 


in bits O through 4 of the first double word of the call gate (see Figure 5-13). The 
count may be as large as 31; it does not include parameters passed in registers. The 
processor does not perform any validity checks on the parameters. 


. Pushing a pointer to the instruction after the CALL onto the new stack. This pointer 


consists of a code segment selector and an instruction pointer value. It provides a 
link back to the calling program. 


The new stack therefore contains the following items, starting from the top (see 


Figure 5-15): 


e Instruction pointer value for the instruction after the CALL 

« Code segment register value for the instruction after the CALL (extended 
to 32 bits) 

- Parameters copied from the old stack 

¢ Old stack pointer value 

¢ Old stack segment register value (extended to 32 bits) 


The OS can use the old stack pointer and stack segment register values to copy more 
parameters if necessary. 
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Figure 5-15 
New stack contents after a change of privilege levels. 


The parameter count is a fixed value for a given call gate. Copying parameters is 
thus awkward when one call gate provides access to several systems routines. Separate 
gates, each with its own parameter count, are a better approach. 

Of course, what goes up (in privilege level) must eventually come down. The op- 
posite of the CALL through a call gate is a return (RET) instruction. RET can change 
privilege levels, but only downward. The reason for this restriction 1s that a sneaky un- 
derprivileged program could otherwise use an RET to access a higher level. After all, 
who would know that no one had ever called it in the first place? It’s like returning in- 
nocently after intermission to a performance for which you had no ticket. 

However, a more privileged program can always call a less privileged program by 
pushing its address onto the stack and doing an RET. This is like joining the retuming 
crowd after you missed the first part of a performance. It may seem sneaky, but you 
haven't violated any rules. Besides, you end up with a great alibi for a murder mystery. 
"T can assure you that Lord Peter Witless was at the symphony that evening. I distinct- 
ly remember him snoring quite loudly through the entire second half of the perfor- 
mance.” 

An RET that returns to a less privileged level works as follows: 

1. Itdoes segment checks and then loads the instruction pointer, code segment register, 
stack pointer, and stack segment register from the stack. 
2. It adjusts the old stack pointer by a number of bytes given as a parameter. 


18 


80386 Memory Management 


3. It checks all data segment registers (DS, ES, FS, and GS) and clears any that refer 
to segments more privileged than the new privilege level. This prevents the new 
code from accessing more privileged segments using leftover selectors. The situa- 
tion is like a social climber who makes contacts using an aristocrat’s discarded 
Stationery. 


Note that the processor need not save the current stack pointer. Presumably its value 
has not changed. 


Conforming Code Segments 


The 80386 also provides a way to create procedures that can run at any privilege level. 
These could include mathematical functions, code conversions, and other general-pur- 
pose utilities. Tl.2 method involves setting the C (conforming) bit in the type informa- 
tion of a code segment descriptor (see Figure 5-4). 

When the processor transfers control to a conforming segment, it does not change 
the current privilege level. The segment thus executes at whatever privilege level its 
caller has. This is the only case in which the current privilege level may not be the same 
as the descriptor privilege level for the current executable segment. A program can do 
a JMP ora CALL to a conforming segment; it need not go through a gate, regardless 
of the segment’s inherent descriptor privilege level. Exception handlers can also be 
conforming segments. This is convenient for divide faults, overflow, and array bounds 
checks in which the handler need only access the current task’s data. It does not have 
to access operating system functions as it would to deal with page or segment faults. 


CREATING DESCRIPTORS 


Obviously, creating descriptors is a complex job. There are several different formats, 
each with many fields. In common practice, only compilers and operating systems 
create descriptors and arrange them intables. Anyone who needs to do this will general- 
ly use special software tools. They let you define tables and descriptors in simple terms 
through a special language. The tool then produces the properly formatted output. 
For example, Intel offers a 386 System Builder (BLD386) that runs under XENIX. 
It consists of a binder, a builder, a librarian, and a mapper. No candlestick maker, but 
they do form afine barbershop quartet. The binderlinks modules and creates a loadable 
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form. The builder is the actual working program. The librarian allows you to maintain 
and manage library files for use in building new applications. The mapper generates 
printed information such as segment maps, gate maps, and symbol maps from object 
files. 


PRIVILEGED INSTRUCTIONS 


Still another aspect of privilege is the existence of instructions that only privileged pro- 
cedures can execute. We will discuss system control instructions here and I/O-related 
instructions in Chapter 6. A procedure can execute the following instructions only if 
its current privilege level is O: 


CLTS Clear Task-Switched Flag 

HLT Halt Processor 

LGDT Load Global Descriptor Table Register 
LIDT Load Interrupt Descriptor Table Register 
LLDT Load Local Descriptor Table Register 
LMSW Load Machine Status Word 

LTR Load Task Register 


MOV to/from CRn Move to Control Register n 
MOV to/from DRn Move to Debug Register n 
MOV to/from TRn Move to Test Register n 
These instructions initialize many of the memory management system’s pointers 
and parameters, such as the global and local descriptor table registers and the page 
directory base register (control register 3). Obviously, these instructions would not ap- 
pear in most user programs anyway. 


INITIALIZATION OF MEMORY MANAGEMENT SYSTEMS 


_ After RESET, the processor starts in real mode at physical address FFFFFFFOH. The 
first far Gntersegment) JMP or CALL makes the processor continue inthe lowest 1 Mb 
of physical memory. The processor must then: 
¢ Set the PE flag to enter the protected mode. 
¢ Initialize the global descriptor table and the GDT register. Initialize local 
descriptor tables and the LDT register if necessary. 
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¢ Initialize the page directories, page tables, and page directory base 
register if the system is using paging. Also set the PG bit to 1. 

The processor can do most of these actions either in the real mode or in the protected 
mode. The simpler approach is to do them in the real mode. Then you need not worry 
about implicitly using protected mode features (such as the global descriptor table) that 
you have not yet initialized. Be sure to do the following: 

¢ Put a JMP immediately after the instruction that sets the PE flag. This 
clears the instruction prefetch queue, eliminating information related to 
the real mode. 

¢ Set PG after setting PE or at the same time. You cannot set PG when the 
processor is in the real mode. 

¢- Puta JMP immediately after the setting of the PG flag. This ensures con- 
sistent addressing before and after the enabling of paging. 


SUMMARY 


The 80386 can operate in two modes: real mode and protected (virtual) mode. In real 
mode, it acts like a fast 32-bit version of the 8086 processor. In protected mode, it acts 
like a 32-bit version of the 80286 processor. Most applications use real mode only for 
initialization and use protected mode for actual operations. Within protected mode, the 
80386 can operate in virtual 8086 mode. This allows it to run 8086 (particularly MS- 
DOS) software simultaneously with software that uses new 80386 features. 

The 80386 divides program, data, and stack areas into units called segments. It 
provides access to up to six segments at a time through segment registers. Addresses 
within a segment are called offsets. 

In its 8086-like modes, the 80386 uses 16-bit offsets. The segment registers contain 
the base addresses of segments divided by 16. The processor computes a physical ad- 
dress by multiplying a segment register’s contents by 16 and adding the offset. The 
relationship between logical and physical addresses is thus a simple arithmetic func- 
tion. 

In protected mode, the 80386 uses 32-bit offsets. The segment registers contain in- 
dexes into tables of descriptors. Each descriptor, inturm, contains a segment’s base ad- 
dress, limit, and other attributes. The processor computes intermediate results (linear 
addresses) by obtaining the base address from a descriptor table and adding the offset. 
Descriptors may be in either a global descriptor table that applies to all tasks or local 
descriptor tables that apply only to particular tasks. 
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In protected mode, the 80386 can also implement paging. Pages are 4-Kb units of 
physical storage. The processor accesses them through a two-stage process involving 
a page directory and page tables. The 80386 automatically keeps information on recent- 
ly accessed pages in a special on-chip cache to avoid repetitive memory operations. 
Attempts to access pages that are not currently in memory cause page-fault exceptions. 
The operating system must then read the page from disk. 

The 80386 offers many protection mechanisms for preventing improper accesses. 
These include: 

¢ Four levels of privilege for descriptors and two levels for pages 

¢ Segment and page attributes that determine type and accessibility 

¢ Scgment bounds included in the descriptors 

¢ Call gates that restrict access points to privileged routines 

¢ Special status, control, and memory management instructions that only 
privileged programs can execute. 
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Delightful task! To rear the tender thought, 
To teach the young idea how fo shoot. 
J. Thomson, The Seasons. Soring 


Do not pray for easy lives. Pray to be stronger 
men! Do not pray for fasks equal fo your 
powers. Pray for powers equal fo your tasks. 
P, Brooks, Going Up to Jerusalem 


This chapter describes 80386 task management techniques. It first explains the reasons 
behind tasking and then discusses 80386 tasking features, task switching, task linking, 
address spaces, I/O privilege levels, I/O permission bit maps, and the initialization of 
tasking systems. Tasking is a key to understanding the 80386 because its memory 
management, protection, and exception handling facilities are all task based. 
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WHAT IS TASKING? 


A task is just a program together with its associated data and assigned memory areas. 
It is like a process in Unix and other multiprogramming or multiuser operating sys- 
tems. In most systems, a task is self-contained. It is a sovereign body, with its own 
entry points, program code, data areas, stack, and current state or status (often called 
its context). A task may have subroutines, data structures, and message passing tech- 
niques. It may call library programs and use other shared facilities. 
For example, let us consider an accounting system. It may have tasks that do the fel- 
lowing: 
- General ledger 
- Accounts payable 
- Accounts receivable 
¢ Payroll 
¢ Report writing 
¢ Audit preparation 
Generally, the user will select one of these tasks from a menu. They are typically com- 
pletely independent programs linked through common data structures or a database. 
These tasks may, in turn, consist of subtasks. The general ledger task may, for ex- 
ample, include the following subtasks: 
« Data entry 
¢ Account file management 
« Sorting 
¢ Query 
¢ Printing 
¢ End of period processing 
Each subtask is, in tum, a self-contained entity. 
Similarly, we may describe an energy management system as consisting of tasks. 
Its tasks might be 
« System setup 
¢ Operator interaction 
¢« Data monitoring 
¢« System control 
¢ Alarm recognition 
e Status reporting 
- Time-keeping 
¢ Emergency handling 
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Some of these, such as operator interaction and alarm recognition, may work via inter- 
rupts. That is, once set up, the management system simply monitors its inputs and con- 
trols heating and air conditioning in a continuous loop. The operator has a special button 
that suspends normal operations and activates either a manual override or a task that 
accepts commands. Alamns similarly force immediate action. 

Note that the priority of the tasks 1s critical here. The operator and alarm tasks must 
take priority over normal operations. Similarly, status requests must be able to inter- 
rupt the usual monitoring and control. The overall system must be able to stop one task, 
Start another, and then resume the first task where it left off. It is thus important to be 
able to: 

¢ Start and stop tasks. 

« Suspend a task and resume it later. It makes no sense, for example, to 
have a printing task remain in control when the printer is busy, offline, 
or malfunctioning. 

« Pass information from one task to another. For example, the alarm task 
must be able to provide a description of what happened and when to the 
Status reporting task. 

A personal computer may also run several tasks that are actually separate programs. 
For example, a user might have the following programs running at the same time under 
a multitasking operating system such as OS/2 or Unix: 

¢« A word processor printing along document 

« A spreadsheet for creating a table to be attached to the document 

¢ A scratchpad accessory for jotting down notes from a telephone conver- 
sation 

¢ A calculator for making a few quick computations on the data received 
by telephone 

« A communications program for retrieving historical information from a 
remote financial or economic database 

Here again, you want to suspend tasks when they are done or have gone as far as 
they can without more input or other external events. Suppose, for example, that you 
have reached the point in the spreadsheet where you need the historical data. Or per- 
haps the word processor has printed everything up to the table, or the communications 
program has encountered transmission problems. You will then want to resume the 
task later without any problems. You will also want to move information from one task 
to another. For example, you must move the calculator’s results and the information 
retrieved from the remote database into the spreadsheet. You must then move the 
spreadsheet’s output to the word processor. 
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Tasking thus has many advantages. It allows you to: 

¢- Do several things simultaneously. You can switch from one task to 
another when necessary or when the original task no longer requires your 
attention. 

¢ Change one task without changing the others. For example, in the ener- 
gy management system, you could change inputs, outputs, alarm han- 
dling methods, or control techniques without affecting the other tasks. 
Similarly, you could add inputs or improved outputs. You could also add 
remote dialing, a tape backup, or communications capabilities. 

¢ Pinpoint problems within a single task. You can use tasking to isolate 
problems to part of a large system. A simple approach 1s to keep replac- 
ing tasks with dummy versions until the error disappears. 

- Incorporate tasks from the operating system or outside sources. Tasks 
such as a keyboard handler may be common to many applications. 

Intel introduced tasking features in the 80286 and expanded them in the 80386. This 
is not to say that one could not do tasking on previous processors such as the 8086 and 
8088. However, those processors did not provide explicit support such as special in- 
structions and built-in data structures and registers. The built-in features execute faster 
and provide more standardization at the cost of some flexibility. Their use also ensures 
compatibility with software from Intel and many other sources. 

Note, however, that the 80386’s tasking features are not essential. One can do task- 
ing without using them. In systems with simple needs, they may create unnecessary 
complexity and overhead. The mere fact that they exist does not make their use either 
essential or desirable. They provide a generalized framework that may be overkill for 
some applications. 
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The 80386 provides the following task-related features: 

- Task state segments. 

- Task state segment descriptors. 

¢ Task register. 

¢ Task gate descriptors. 

- The NT (nested task) flag in the extended flag register for use in return- 
ing from tasks that have been called by other tasks. NT is bit 14 of 
EFLAGS (see Figure 2-3). 
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Figure 6-1 
32-bit task state segment. 


Tasking applies in protected mode (including V86 mode) but notin real mode. V86 
tasks must have the VM bit (in EFLAGS) set to 1 and must run at privilege level 3. A 
virtual machine monitor must handle special V86 exceptions such as INTs used to enter 
MS-DOS. 
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Task State Segments 


Task state segments hold a task’s current status in a predefined form. You may com- 
pare them to a business’ balance sheet or a team’s roster with up-to-date statistics. The 
existence of the TSS allows an operating system to readily change a task’s state. The 
OS can activate it, suspend it, or terminate it (or kill it, if you have a violent nature) by 
using its task state segment. Task state segment descriptors define tasks. The task 
register points to the task state segment for the current task. Task gate descriptors 
provide indirect, protected access to task state segments. They do privilege checking 
much like the call gates described in Chapter 5. 

Figure 6-1 shows the fields in a minimum task state segment. Note that even this 
segment is large and complex, occupying at least 104 bytes of memory (26 double 
words). It contains two types of information: 

¢ Dynamic infomation that the processor updates each time it switches to 
another task. This includes the user registers or machine state — general- 
purpose registers, segment registers, flags, and instruction pointer. It also 
includes the selector for the task state segment of the previously execut- 
ing task if a return to that task is expected. 

¢ Static information (software state) that is a permanent part of the task’s 
environment. This includes the selector for the task’s local descriptor 
table, the base address of its page directory, pointers to the stacks for 
privilege levels O through 2, the debug trap bit, and the I/O map base. We 
will discuss the I/O map base (used to access the I/O permission bit map) 
later in this chapter. 

The task state segment thus includes both a task’s overall environment and its cur- 
rent status. Loading the task state segment not only replaces the usual initialization of 
registers that starts a task, but it also allows a suspended task to resume with its old 
status. This greatly simplifies the task scheduler’s job. The disadvantage of the ap- 
proach is that it introduces extra overhead for simple tasks without extensive status 1n- 
formation. 

A task state segment may contain more information than Figure 6-1 shows. Figure 
6-2 1s an extended task state segment with an optional software state and I/O permis- 
sion bit map (used to give a task access to a limited set of I/O devices). We will dis- 
cuss the I/O permission bit map (or I/O guard map) later in this chapter. The optional 
software state depends on the operating system, rather than being 80386-defined. It 
may, for example, contain the current state of a numeric coprocessor (see Chapter 8), 
permission bits defined by an operating system such as Unix, scheduling priority, ac- 
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Figure 6-3 
Process task state segment for a Unix-like operating system U/386. 
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Figure 6-4 
Task state segment descriptor for a 32-bit task state segment. 


counting infonnation, or open file descriptors. There may also be places for debug 
registers and other facilities if the operating system uses them. Figure 6-3 shows a TSS 


for an example Unix-like operating system called U/386. 


Task State Segment Descriptors 


A task state segment (TSS) descriptor defines each task state segment. Figure 6-4 shows 


its format. The fields are: 


Base address of the segment. 

Limit (must be at least 103 bytes to hold a valid task state segment). The 
segment can be as large as 64K. 

Descriptor privilege level C(DPL). It determines whether other tasks can 
switch to this task. The level is usually O so that only trusted procedures 
(such as the operating system’s scheduler) can do task switching. 

P (present) bit. Indicates whether the task is in memory. 

G (granularity). Indicates whether the limit is in units of bytes (0) or4K 
bytes (1). 

B (busy). Indicates whether the task is already in use. Tasks are not 
reentrant — that is, the processor cannot execute a task while the same 
task is active. If several programs call the same task, the operating sys- 
tem must provide them with multiple copies. 


TSS descriptors have the following special features: 


They do not allow for reading or writing of the TSS. The only way to do 
either is to create another descriptor that redefines the TSS as a data seg- 
ment. Simply loading the TSS descriptor into a segment register causes 
an exception. 
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Figure 6-5 
Task register and its relationships to the task state segment and task 
State segment descriptor. 


« They mustresidein the global descriptor table. TSS’s cannot be ina local 
descriptor table. 


Task Register 


The task register contains the selector for the currently executing task. Like a segment 
register, this register has visible and hidden parts. The hidden parts contain the task 
State segment’s base and limit, derived from the descriptor when its selector is loaded. 
Figure 6-5 shows the relationships among the elements that define a task. 
There are special instructions for loading and storing the task register. They are: 
¢- LTRloads thetask register from a 16-bit general-purpose register or from 
a memory word. 
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Figure 6-6 
Task gate descriptor. 


¢ STR stores the task register in a 16-bit general-purpose register or in a 
memory word. 

Only procedures running at privilege level O (usually operating system procedures) 
in protected mode can execute LTR. STR is not privileged. These are always 16-bit 
operations. The operand-size attribute has no effect on them. Remember that in prac- 
tice, the processor transfers the hidden part of the task register or descriptor along with 
the visible part. 

The operating system generally uses LTR only to initialize the task register. Task 
Switches then change the value as needed. 


Task Gate Descriptors 


A task gate descriptor provides an indirect, guarded entry point to a task state segment. 
Figure 6-6 shows a task gate’s format. It contains the following fields: 
¢ Selector for a task state segment descriptor. 
¢- P (present) bit that indicates whether the descriptor is currently in 
memory. 
¢ DPL (descriptor privilege level). This is the privilege level required to 
uSe the gate. 
Note the differences between a task gate and a call gate: 
¢ A task gate refers to a task state segment descriptor, not to an actual entry 
point. Thus a task gate has no offset field. There is an extra level of in- 
direction here. 
¢ Task gates do not allow parameter passing, so they do not contain a 
parameter count field. Since the stack pointer changes during a task 
switch, tasks cannot communicate through the stack. They must com- 
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Figure 6-7 
Indirect access to a task descriptor through several task gates. 


municate instead through shared memory areas. The precise method, of 
course, is OS-dependent. 
Task gates provide flexibility to systems for the following reasons: 

« Several task gates may select the same descriptor, thus providing entry 
points at different privilege levels or through the local descriptor table or 
interrupt descriptor table. 

¢ Task switches can be limited to specific tasks through entries in the local 
descriptor table. 

¢ Tasks can be activated by interrupts or exceptions through entries in the 
interrupt descriptor table. 
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Figure 6-7 shows a single task descriptor that the processor can access directly or 
through the local descriptor table or interrupt descriptor table. Remember that the task 
descriptor itself must reside in the global descriptor table. 

In practice, one would often implement interrupt service routines as tasks because 
they require their own contexts. Since an interrupt can occur at any time, its handler 
must operate independently of the suspended program. An interrupt handler usually 
goes through what amounts to a task switch anyway. That is, it saves all the registers 
and then does some initialization. Exception handlers, on the other hand, often must 
use the context of the task that incurred them to correct or describe conditions. Note 
that interrupt and exception handlers implemented as tasks need not save and restore 
registers, as the task switch does that automatically. 


TASK SWITCHING 


A task switch can occur through a task state segment descriptor or a task gate. As noted 
above, the processor can access a task gate from the current task or via an interrupt or 
exception. A task switch also occurs when the current task does a return with its NT 
(nested task) flag set. 

How does the processor know whether to do a task switch? It knows either from the 
type of descriptor referenced or from the NT flag. Remember that each descriptor has 
a type field. The reference can be either to a task state segment descriptor or to a task 
gate. 

A task switch proceeds as follows, assuming that the new task is present and has a 
valid limit: 

« The processor checks whether privilege levels allow the switch. If they 
do not, a general privilege exception occurs (see Chapter 7). 

¢ The processor saves the current task’s state. The base address of the cur- 
rent task state segment is in the hidden part of the task register. The 
processor saves all general-purpose registers and segment registers, the 
flag register, and the instruction pointer in the task state segment. This 1s 
roughly equivalent to a series of MOVs plus an interrupt response. 

e The processor loads the task register with the selector of the new task’s 
TSS descriptor. It also loads the hidden part of the descriptor at this time. 

¢« The processor loads the new task’s state from its TSS and executes it. 
The state includes all general-purpose and segment registers, the local 
descriptor table register, the flags, and the page directory base register. 
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Loading it is roughly equivalent to a series of MOVs plus an interrupt 
retum. 
The instructions that can cause atask switch are JMP, CALL, and IRET. Interrupts and 
exceptions can also cause a task switch (through the interrupt descriptor table as shown 
in Figure 6-7). 

A task switch resembles an interrupt response. The processor always saves the pre- 
vious task’s state. If that task is resumed, it starts after the instruction that caused the 
switch. The registers have the values they had before the switch, much as they would 
if an interrupt had occurred. 

A task switch sets the TS (task switched, probably not what you were thinking) bit 
in control register O (bit 3; see Figure 2-4). Its main use 1s to show whether the numeric 
coprocessor’s state 1s related to the current task. If TS is set, anumeric coprocessor in- 
struction causes an exception. The handler must then determine whether a different 
task 1s now active from the one that did the last instruction. If so, the handler must save 
the coprocessor’s state in the previous task’s TSS. This allows the operating system to 
avoid saving the coprocessor’s state if intervening tasks do not use it anyway. Chap- 
ter 8 describes coprocessors in more detail. The tradeoff here is a good one, since rela- 
tively few tasks use a numeric coprocessor in most systems. 

Note that the new task’s privilege level is not related to the old task’s. After all, tasks 
have their own address spaces and task state segments. Furthermore, privilege rules 
prevent improper access to a task state segment. The new task’s execution level is the 
RPL of the code segment selector value loaded from the task state segment. 

Task switches do not change system wide resources. These are control register O, the 
global and interrupt descriptor table registers, and control register 2 (the page fault 
linear address). 

Task switches have both advantages and disadvantages. On the positive side, they 
provide a standard way to switch control between independent programs. The new task 
has its own context. It does not share limited resources such as stack space with the old 
task. On the negative side, task switches may be time-consuming and inefficient. For 
example, a simple keyboard interrupt handler would only need to save and restore a 
few registers rather than the complete machine state used in a task switch. The same 
would also apply to serial communications, printer, or real-time clock interrupts. Such 
handlers would therefore probably not be implemented as tasks. 
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TASK LINKING 


How does a task retum control to its predecessor? Note that the entire address space 
may have changed, so a link in the stack is not sufficient. Instead, the 80386 fills the 
back link of the new task’s TSS (see Figure 6-1) with the selector of the old task’s TSS. 
It also sets the NT bit in the new task’s flag register, indicating that the back-link field 
is valid. The new task must end with IRET. The processor will then switch back to the 
task inthe back-link field if the NT flag 1s set. This applies to new tasks entered through 
CALLs or through interrupts or exceptions but not to those entered through JMPs. 

One problem is that a chain of back links may become circular. What happens if the 
processor eventually tries to retum to the original task? The chain may, of course, be 
quite long so its circularity is far from obvious. For example, such a chain could result 
from a series of interrupts in a multilevel nested system. 

The solution is to use the B (busy) bit of the task state segment descriptor (See Figure 
6-4). Note that this bit is part of the type information — a busy task 1s actually a dif- 
ferent type from a nonbusy task. The procedure is as follows: 


1. The processor automatically sets the busy bit of each new task when it 1s activated. 

2. When switching from a task, the processor clears its busy bit if 1t 1s not to be placed 
on the back-link chain. This is the case if the switch occurs because of a JMP or 
IRET instruction. Otherwise, the B bit remains set. This happens if the switch is the 
resultofa CALL, an interrupt, or an exception. B thus indicates that the task 1s either 
currently active or will be resumed later. 

3. When switching to a task, the processor causes an exception if the busy bit is set. 
Thus a task cannot switch to any task on the back-link chain, no matter how long 
the chain is. 


What happens if the system must remove a task from the back-link chain? Suppose, 
for example, that an error occurs and the task cannot be resumed. Now the operating 
system must change the back-link field in the TSS of the next task on the chain. It must 
then clear the B bitin the TSS descriptor of the task that must be removed. If the error 
1S a fatal one such as a double bus error or a memory parity error, the system must 
remove all links before exiting. Otherwise, the system could leave tasks in the busy 
State erroneously, and other tasks could not transfer control to them. This is one aspect 
of the general problem of removing “dead” tasks from multitasking systems. 


NO 
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Note also that the operating system should clear unused back links. The clearing 
will make accidental references to them cause exceptions rather than task switches with 
unpredictable results. 


TASK ADDRESS SPACES 


Several parameters define a task’s address space. They are: 

- The global descriptor table. All tasks can access addresses defined 
through this table. 

- The local descriptor table. Only a specific task can access addresses 
defined through this table unless several tasks share a descriptor table or 
descriptors actually refer to the same address space (that is, they are 
aliases of each other). 

Since the TSS (Figure 6-1) includes values for both the local descriptor table register 
and the page directory base register, tasks may have separate or shared address spaces 
at any level. Modules may cooperate through shared address spaces. The operating sys- 
tem may use the same page directory for all tasks or may assign them different 
directories. 

Tasks can share address spaces in the following ways: 

- By using entries in the global descriptor table. All tasks have access to 
these descriptors. No exclusion or restriction 1s possible. 

¢ By sharing alocal descriptor table. This approach is more restrictive than 
the one based on the global descriptor table. Some tasks may have ac- 
cess to the space whereas others may not. However, all local descriptors 
are always part of the shared area. 

- By using entries from different local descriptor tables that point to the 
same linear address space. As just noted, wecommonly call such descrip- 
tors aliases since they are different names for the same thing. This method 
of sharing addresses is even more restrictive than the sharing of local 
descriptor tables, as it may involve only a few descriptors. Other entries 
in the local descriptor tables may point to distinct linear addresses. 

Programs can share data through a shared memory facility. The facility could con- 
sist of a public region implemented using shared page tables. 
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I/O PRIVILEGE LEVELS 


One problem with multitasking and multiuser systems 1s managing the use of shared 
I/O devices. For example, suppose we have a multitasking personal computer with a 
typical complement of I/O devices: keyboard, video display, disk, and printer. The 
Operating system must control access to these devices. Otherwise, several tasks could 
try to use them at the same time. You could, for example, find compiler waming or 
error messages in the middle of atyped document. Or the results of a spreadsheet could 
appear in a program listing or on mailing labels. Similarly, consecutive keystrokes 
could end up serving as inputs to different programs. 

The way to avoid such confusion is to assign each task an input/output privilege 
level (IOPL) higher than its operating (current) privilege level. Then an attempt by the 
task to do I/O causes a general protection exception. The operating system takes con- 
trol and manages the I/O. IOPL is bits 12 and 13 of the extended flags (see Figure 2- 
3). Hence its value is part of the status in the TSS (Figure 6-1). 

For example, in a simple user/supervisor system with privilege levels O and 3, all 
user tasks would have IOPL = O. Thus an attempt by a user task to do I/O would cause 
a supervisor call. Of course, the supervisor action would be transparent as far as user 
tasks were concerned. The only noticeable effect would be long, irregular execution 
times for trapped instructions (IN, INS, OUT, OUTS, INT n, IRET, PUSHF, POPF, 
STI, and CLI). IRET, PUSHF, and POPF arc included here because they can change 
IF (and hence IOPL). 

IOPL and I/O permission maps (our next subject) apply also in V86 mode but not 
in real mode. In V86 mode, using IOPL can keep 8086 applications programs from 
disabling interrupts for long periods, reprogramming disk controllers, or accessing 
other hardware devices directly. The V86 monitor can then emulate the functions and 
devices without interfering with other tasks. Emulation can also allow programs that 
manipulate hardware directly to run on newer computers. New machines may, for ex- 
ample, have more powerful interrupt, CRT, DMA, and disk controllers than older com- 
puters. The monitor can direct manipulations to virtual I/O devices, which then convert 
them into operations for the new devices. 


YO PERMISSION MAPS 


How can we allow tasks to use some I/O devices but not others? For example, we might 
want to allow: 
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Figure 6-8 
I/O address permission bit map. 


« Each user on a multiuser system to access his or her own console and 
printer but not anyone else’s or systemwide devices. 

« A particular task to access a specific I/O device such as a high-speed 
analog channel or a signal or image processing board. Obviously, this 1s 
particularly important for real-time devices where the overhead caused 
by an exception and an OS call would be intolerable. 

¢ Foreground (high-priority interactive) tasks to access the keyboard, dis- 
play, or printer ona single-user machine, while background (low-priority 
batch) tasks cannot. The operating system may provide virtual devices 
(usually disk files) for background tasks to use. 

¢ V86 tasks to access certain I/O devices (such as a CRT controller or I/O 
port) directly but not others (such as disk, interrupt, or DMA controllers). 

The solution is to use an I/O permission bit map (also called an //O guard map). It 
is accessible from the task state segment as shown in Figure 6-8. The two highest-ad- 
dressed bytes in the TSS contain a pointer to the I/O permission map. Each bit in the 
map corresponds to an I/O port byte address. If the bit’s value 1s 1, the task cannot ac- 
cess the port. That is, IOPL applies. If the bit is O, the port 1s accessible despite IOPL’s 
value. 
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For example, suppose a task tries to do output to port 35 (decimal). If the task’s 
IOPL is more privileged than its CPL, the processor examines the I/O permission bit 
map to see if I/O may be allowed anyway. The controlling bit is bit 3 in the byte 4 
beyond the map’s base address. That is, bit 35 is equivalent to bit 3 of byte 4. I/O 1s al- 
lowed if that bit is O. 

Note the following about I/O permission bit maps: 

¢ Their sizes must be included in the limit specified in the TSS descriptor. 

e They extend only as far as the TSS descriptor limit. Any ports that are 
not in the map are assumed to be inaccessible. That is, their permission 
bits are 1 by default. So you need not include a map if IOPL is to apply 
to atask. Nor do you have to specify ports beyond the highest numbered 
accessible Gne. 

¢ Operations involving multibyte ports are allowed only if all bytes have 
Q permission bits. 

e They must end with an all-1s pad byte. It ensures that the processor can 
read the last byte. Reading occurs on a word basis. 

¢ If atask state segment has no bit map, its map base pointer should be set 
to the segment limit. It should not be zero! A sure way to define a null 
permission map Is to set the base pointer to FFFF hex. Figure 6-9 shows 
both explicit and null maps. 

I/O permission bit maps are anew feature of the 80386. Previous processors did not 
have them. 


INITIALIZATION OF TASKING SYSTEMS. 


You can use Intel’s BLD386 or other similar tools to create task state segments, task 
state segment descriptors, task gates, and entries in global, local, and interrupt descrip- 
tor tables. BLD386 can obtain initialization information from user input or from a 
specified input module. It will even create TSSs automatically when the user does not 
provide specific task definitions. Of course, you can also create all the structures 
manually if you have the will power. 
The initialization information for TSSs can include: 

¢ Task name 

¢ TSS descriptor 

¢ Task LDT selector 

¢ Initial code and data segment values 
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Figure 6-9 
Explicit and null I/O permission maps. 


¢ Stack selectors 
¢ I/O privilege, interrupt status, and trap flag values 
¢ Base address 
e Size 
BLD386 creates a TSS descriptor for each TSS. You can specify the values for: 
¢ Present bit 
¢ Descriptor privilege level 
¢ Base address 
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MULTI TASK ; 


SEGMENT 
NUCLEUS (DPL = 0): 


TABLE 
COMMONLDT (ENTRY = (MOD1, MOD2, SDATA) ) ; 


TASK 


TASKIBLOCK (OBJECT = MOD1, --DPL=3 

LDT = COMMONLDT, 

STACKS = (NUCLEUS.STACK1) ), <<DPL=0 
TASK2BLOCK (OBJECT = MOD2, --DPL=3 

LDT = COMMONLDT, 

STACKS = (NUCLEUS.STACK2) ) ; - -DPL=0 


CATE 
TASKIGATE (TASK, ENTRY = TASK1BLOCK) ; 
TASK2GATE (TASK, ENTRY = TASK2BLOCK) ; 
TABLE 
GDT (ENIRY = (NUCLEUS, 
COMMONLDT, 
TASK1BLOCK, 
TASK1GATE, 
TASK 2BLOCK, 
TASK2GATE) ) ; 


END 


Figure 6-10 
Example BLD386 program for a two-task module. 


¢ Limit 
BLD386 can also create task gates. You can specify: 
¢ Gate name 
e Entry point 
¢ Descriptor privilege level 
« Present bit 
¢ Type (80286 or 80386) 

Figure 6-10 shows an example BLD386 program for a two-task module. The tasks 
share a local descriptor table COMMONLDT and a global data area SDATA (for 
shared data). The LDT contains descriptors for all segments in modules MOD1 and 
MOD2. Both tasks have stack segments defined in module NUCLEUS. The global 
descriptor table, defined at the bottom, has entries corresponding to all segments in the 
module NUCLEUS, the local descriptor table, the two task state segments, and the two 
task gates. Note that we need only specify a few parameters. BLD386 takes care of the 
rest by default. 

You must also create a dummy TSS and a valid TSS descriptor for the operating 
system to use in the first task switch. The dummy serves as the nonexistent previous 
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task in this case. The operating system must use LTR to load the dummy’s selector into 
the task register. It can then start the first task with a JMP TSS instruction. The 80386 
will write the machine state into the dummy TSS just as though it had switched from a 
real task. Note that you must use JMP TSS to avoid creating a back link. 
A V86 task must have the following special initialization: 
¢ VM bit in EFLAGS set to 1 
- CS selector field set to the linear base address of the task’s initial code 
segment divided by 16 
¢ IP field set to the task’s entry point 
¢ IOPL field in EFLAGS set to 3 if the task can access the I flag and to O 
otherwise 
¢ LDT selector field set to O unless an interrupt or exception procedure 
uses an LDT. 


SUMMARY 


The 80386 has many features aimed at efficient implementation of independent 
nrograms or tasks. Each task’s current state is in a data structure called a task state seg- 
ment (TSS). The TSS contains initial values for the genc7ai-rurpose registers, segment 
registers, local descriptor table register, flags, instruction pointer, page directory, base 
register, and stack pointers (and stack segment registers) for all privilege levels at which 
the task may run. The TSS also contains a back link to the previous TSS. The link is 
valid if another task called the current task and must regain control from it on termina- 
tion. 

Each TSS has a corresponding descriptor in the global descriptor table. The descrip- 
tor contains a base address, a limit, a privilege level, and a busy bit. The task register 
contains the selector for the currently active TSS. The register’s hidden part contains 
the descriptor’s base address and limit. 

Task gates provide indirect, guarded entry points to task state segments. A task gate 
contains a selector for a TSS descriptor, a present bit, and a privilege level. The use of 
task gates allows interrupt or exception handlers to be implemented as tasks. This ap- 
proach 1s particularly applicable to complex interrupt handlers; it is less well-suited to 
simple interrupt handlers and most exception handlers. 

A task switch involves saving the current task’s status 1n its TSS and loading the 
new task’s status from its TSS. Such a switch occurs when a task jumps to or calls a 
TSS descriptor or atask gate. It also occurs when a task does aretum withits NT (nested 
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task) flag set. Task switches work much like a transfer of control to an interrupt ser- 
vice routine. Tasks can be linked to any depth through the back-link field in their TSSs. 

Tasks can have distinct (private) or shared address spaces. The usual way to obtain 
private address spaces is through separate local descriptor tables. The usual ways to 
share address spaces are through the global descriptor table, common local descriptor 
tables, or descriptors that point to the same linear address space (aliases). These ap- 
proaches differ in terms of how much you can limit the sharing. 

Tasks may have restricted ability to do I/O. The I/O privilege level (OPL) in the 
flag register dctermines the minimum privilege level at which I/O is permitted. Tasks 
running at less privileged levels can do I/O only under the control of more privileged 
levels. However, the I/O permission bit map in the task state segment can remove this 
restriction for specific ports. Restrictions on I/O are essential for managing shared 
devices on multitasking and multiuser systems. They also allow newer systems to run 
programs that assume older I/O configurations. 
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80386 Exceptions 
anda Debugging 
Features 


No rule is so general, which admits not some exception. 
Burton, [The Anatomy of Melancholy 


/ never make exceptions. An exception disproves the rule. 
sir Arthur Conan Doyle, The Sign of Four 


The first 90 percent of the code accounts for the first 
90 percent of fhe development time. The remaining 
10 percent of the code accounts for the other 90 percent of 
the development time. 
Iom Cargill 


This chapter covers 80386 exceptions and debugging features. Exceptions, as noted 
earlier in a discussion of interrupts, are internal conditions or instructions that make 
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the 80386 suspend its normal activities and do special routines. Typical causes are over- 
flow, division by zero, page faults, and protection violations. The chapter describes the 
sources of exceptions, theirclassification and identification numbers, interrupt descrip- 
tor tables, interrupt and trap gates, interrupt tasks and procedures, error codes, and ex- 
ception conditions. The last section presents the 80386’'s special debugging features. 


NEW 80386 FEATURES 


Exceptions work much the same on the 80386 as on its predecessor, the 80286. The 
differences are: 
¢ The use of interrupt 1 for general debugging exceptions rather than just 
for single-step inputs. This change reflects the introduction of hardware 
debugging features in the 80386. 
¢ The introduction of exception types (faults, traps, and aborts). 
¢« The addition of page fault and coprocessor error exceptions. Paging is a 
new feature in the 80386, and page fault exceptions are the Key to creat- 
ing demand-paged virtual memory systems. 
¢ Revised definition of double faults so that they occur only if the proces- 
sor detects a serious exception while processing another serious excep- 
tion. Double faults are thus farless likely on the 80386 than on the 80286. 
The 80386 also has new or revised exception conditions resulting from its larger 
task state segments and the addition of two new segment registers (FS and GS). 
The 80386 has greatly expanded hardware debugging features. The additions are 
four debug address registers, along with control and status registers. They allow debug- 
gers to specify instruction or data breakpoints in hardware rather than in software. 


SOURCES OF EXCEPTIONS 


The 80386 handles exceptions much as it handles the external interrupts discussed in 
Chapter 4. In fact, we may consider interrupts as special cases of the more general 
category of exceptions. Table 7-1 summarizes 80386 exceptions. This table overlaps 
considerably with the interrupt summary in Table 4-3. 
There are two types of exceptions: 
¢ Processor detected. 


NO 
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Table 7-1 
Summary of 80386 Exceptions 


Return Address 


interrupt Points to Function That Can Generate 
PP ACTER | Number Faulting BRcSpUCniibeee the Exception 
| instruction 

Divide error 0) YES FAULT DIV, IDIV 

Debug exceptions 1 " m4 Any instruction 

Breakpoint 3 NO 1 TRAP One-byte INT 3 

Overflow 4 NO TRAP INTO 

Bounds check 5 YES FAULT BOUND 

Invalid opcode 6 YES FAULT Any illegal instruction 

Coprocessor not available 7 YES FAULT ESC, WAIT 

Double fault 8 YES ABORT Any instruction that can gener- 
ate an exception 

Coprocessor Segment 9 NO ABORT Any operand of an ESC 

Overrun instruction that wraps around 
the end of a segment. 

Invalid TSS 10 YES FAULT? JMP, CALL, IRET, any interrupt 

Segment not present 11 YES FAULT Any segment-register modifier 

Stack exception 12 YES FAULT Any memory reference thru SS 

General Protection 13 YES FAULT /ABORT? Any memory reference or code 
fetch 

Page fault 14 YES FAULT Any memory reference or code 
fetch 

Coprocessor error 16 YES FAULT‘ ESC, WAIT 

Two-byte SW Interrupt 0-255 NO TRAP INT n 


« Programmed. These are the so-called software interrupts INTO, INT 3 
(a special 1-byte instruction), INT n, and BOUND. They are instructions 
whose sole purpose is to cause an exception. Their most common uses 
are in data and address validation and in debugging (see the last section 


of this chapter). 


We may further classify processor-detected exceptions as (in order of increasing 
seriousness) faults, traps, and aborts. Table 7-1 lists the type of each exception and in- 


NO 
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Table 7-2 
Priority Among Simultaneous Exceptions and Interrupts 





Priority Ctass of Interrupt or Exception 





HIGHEST Faults except debug faults 
Trap instructions INTO, INT n, INT 3 
Debug traps for this instruction 
Debug faults for next instruction 
NMI interrupt 

LOWEST INTR interrupt 


dicates whether the return address points to the faulting instruction. It also notes what 
instructions or references can cause the exception. 

Most exceptions are faults (the least serious category). Faults (for example, break- 
points, divide errors, and not-present conditions) are correctable. The processor reports 
them before it finishes executing the instruction. It saves the instruction’s address (code 
segment register and instruction pointer values) to allow restart. Note that unlike the 
Motorola 68010 and 68020, the 80386 cannot continue the instruction from the point 
at which it was interrupted. It can only start it over from the beginning. 

Traps (for example, single-step and task-switch breakpoint) do not need to be re- 
Started. In fact, restarting them would cause an endless loop. The processor reports 
them when it reaches the next instruction. It saves that instruction’s address to allow 
program resumption. The next instruction may be the target of a jump. 

Aborts (double faults and other serious hardware errors) do not allow for restart or 
resumption. The processor does not save an instruction address. The operating system 
must take complete control of the situation in this case. 

Table 7-2 shows the priority among simultaneous interrupts and exceptions. The 
processor holds lower priority interrupts pending while it services the event with the 
highest priority. It discards lower priority exceptions but will rediscover them later 
upon eventual return to the point of interruption. 


INTERRUPT DESCRIPTOR TABLE 





We have already mentioned the interrupt descriptor table in Chapter 4. However, our 
discussion there dealt only with the 8086-like case in which the table consists of 4-byte 
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31 23 15 7 0 


| BASE | 2 
| LIMIT | 0 


Figure 7-1 
Interrupt descriptor table and interrupt descriptor table register. 


80386 TASK GATE 





80386 INTERRUPT GATE 
31 23 15 7 0 


OFFSET 31.16 PIOPLIO 1 110f000 (NOT 4 


OFFSET 15..0 





80386 TRAP GATE 
31 23 15 7 0 


USED) 


! SELECTOR | OFFSET 15..0 : : 


| I 


OFFSET 31.16 PopLjo 1111700047 NOT 4 


Figure 7-2 
Descriptors for 80386 task, interrupt, and trap gates. 
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Figure 7-3 


Vectoring to an interrupt service procedure. 


entries. More generally, the IDT is an array of 8-byte gate descriptors. Unlike the GDT, 
its first entry may contain a valid descriptor. Figure 7-1 shows the interrupt descriptor 
table and the IDT register which contains its base pointer and limit. The operating sys- 
tem must ensure that the IDT contains valid descriptors for each interrupt or exception 
number that a computer uses. 
The generalized IDT may contain any of the following kinds of gates (see Figure 7- 
2): 
¢ Task gates (described in Chapter 6) 
¢ Interrupt gates 
e Trap gates 
Interrupt gates and trap gates resemble the call gates described in Chapter 5. They do 
not cause task switches as a task gate does. The only difference between an interrupt 
- gate and a trap gate is that a transfer through an interrupt gate disables maskable inter- 
rupts, whereas a transfer through a trap gate does not. Disabling interrupts prevents 
later interrupts from interfering with the current handler. As you can see from Figure 
7-2, only a single bit in the status field differentiates between trap and interrupt gates. 
Exception handlers are usually invoked through trap gates. Only ones that require 
interrupts to be disabled are invoked through interrupt gates. 


NO 
O» 


80386 Exceptions and Debugging Features 





Figure 7-4 
Exception error code format. 


All three types of gates contain privilege levels. As mentioned in Chapter 6, excep- 
tion handlers generally run at privilege level O in order to access operating system data. 
Interrupt gates or trap gates cause an indirect jump to a procedure that executes in 
the current task’s context as shown in Figure 7-3. Their descriptors may be in either 
the local or the global descriptor table, although exception handlers are usually in the 
GDT to avoid task dependence. The invoked procedure differs from an ordinary pro- 
cedure in the following ways: 
¢ The initial transfer to it causes the EFLAGS register to be pushed onto 
the stack. Figure 4-10 shows the order in which the processor saves the 
status components (flags, instruction pointer, and code segment register). 
¢« Many exceptions also cause the processor to push an error code onto the 
stack. 
¢ The usual exit 1s via a4 32-bit IRET instruction rather than an RET. IRET 
retrieves the flags from the stack (including the previous state of IF). 
- TF (the trap flag) is reset (cleared) after the processor saves EFLAGS on 
the stack. Single-stepping thus does not affect interrupt servicing. 


ERROR CODES 


For exceptions related to a specific segment, the processor pushes an crror code onto 
the handler’s stack. The code ends up on top of the status (see Figure 4-10). Figure 7- 
4 shows the usual error code format. It applies to all codes except those resulting from 
page faults. The fields are: 

¢ Selector index — the upper 14 bits of the segment selector 

- J (UDT) bit — 1 if the selector is a gate descriptor in the IDT, O otherwise. 
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Description 


The access causing the fault originated when the 
processor was executing in supervisor mode. 








The access causing the fault originated when the 
processor was executing in user mode. 


The access causing the fault was a read. 
The access causing the fault was a write. 
The fault was caused by a not-present page. 
The fault was causec by a page-level protection violation. 





Figure 7-5 
Error code format for page fault errors. 


TI bit —if Tis O, O indicates that the error code refers to the global descrip- 
tor table and 1 indicates the local descriptor table. It is meaningless if I 
is 1. 

EXT bit — set (1) if an extemal event caused the exception, cleared (0) 
otherwise. 


The error code for page faults is special (see Figure 7-5). It has 3 bits that indicate: 


Whether the processor was in supervisor or user mode (U/S field). O 
means supervisor, | user. 

Whether the problem occurred during a read or a write (W/R field). O 
means read, 1 write. 

Whether the problem was a not-present page or a protection violation (P 
field). O means a not-prcsent page, 1 a page-level protection violation. 


Table 7-3 summarizes the 80386’s exception error codes. Handlers must pop error 
codes from the stack before resuming the suspended program. 
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Table 7-3 
80386 Exception Error Codes 


Description odes Error Code 
Divide error 0 No 
Debug exceptions 1 No 
Breakpoint 3 No 
Overflow 4 No 
Bounds check J No 
Invalid opcode 6 No 
Coprocessor not available 7 No 
System error 8 Yes (always 0) 
Coprocessor Segment Overrun 9 No 
Invalid TSS 10 Yes 
Segment not present 11 Yes 
Stack exception 12 Yes 
General protection fault 13 Yes 
Page fault 14 Yes 
Coprocessor error 16 No 
Two-byte SW interrupt | 0-255 No 


EXCEPTION CONDITIONS 


The general exception conditions are: 
Q —divide error (divisor is zero during DIV or IDIV) 
1 — debug exceptions — breakpoints, single-step, and other debugging 

conditions 

— 1-byte breakpoint (INT 3) 

— overflow UNTO) 

— bounds check (BOUND) 

— invalid operation code 

— coprocessor not available 

— double fault 

— coprocessor segment overrun 

10 — invalid task state segment 

11 — segment not present 

12 — stack exception 

13 — general protection exception 


OMAN AN NN & W 
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14 — page fault 
16 -— coprocessor error 
Note: Remember that the nonmaskable interrupt input uses identification code 2 and 
that Intel has reserved interrupts 15 and 17 through 31 without assigning any current 
functions to them (see Table 4-3). 
The following exceptions are useful for applications programmers: 


. Interrupt O — divide error. This exception occurs during a divide instruction (DIV 
or IDIV) if the divisor is 0. 

. Interrupt 3 — breakpoint. This is the standard software interrupt. Its 1-byte length 
makes it handy in debugging. By replacing the first byte of an instruction with it, 
you can make the program stop and retum control to a monitor or operating system. 
Debuggers often use INT 3 to set breakpoints. Note that this approach does not work 
if the program is in ROM. 

. Interrupt 4 — overflow. This instruction 1s used to recognize arithmetic overflow. 
It causes an exception if the Overflow flag is set (1). 

. Interrupt 5 — bounds check. This fault occurs if the processor finds an array 
reference outside the specified limits. The BOUND instruction activates it. 
Interrupt 1 depends on the debug registers discussed later in this chapter. Interrupts 

7,9, and 16 depend on the coprocessor interface and hence are covered in Chapter 8. 


pee 


NO 


Od 


i 


Invalid Operation Code Exception (Interrupt 6) 


This fault occurs when the execution unit detects an invalid operation code. It does not 
occur at instruction prefetch. No error code is pushed onto the stack. The fault also oc- 
curs when the execution unit detects an illegal type of operand such as a register as the 
target of a jump. 





DOUBLE FAULTS 





A double fault is caused by a serious exception occurring while the processor is han- 
dling a previous serious exception. To determine when a double fault occurs, we must 
first divide exceptions into three classes as shown in Table 7-4, namely, benign excep- 
tions, contributory exceptions, and page faults. Table 7-5 then shows which combina- 
tions of exceptions cause a double fault. The rule 1s like one of the spelling rules no 
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Table 7-4 
Double-Fault Detection Classes 





Class ID Description 
1 Debug exceptions 
2 NMI 
3 Breakpoint 
Benign 4 Overfiow 
Exceptions 5 Bounds check 
6 Invalid opcode 
7 Coprocessor not available 
16 Coprocessor error 
0 Divide error 
9 Coprocessor Segment Overrun 
Contributory 10 Invalid TSS 
Exceptions 11 Segment not present 
12 Stack exception 
13 General protection 
Page Faults 14 Page fault 


Table 7-5 
Double-Fault Definition 





SECOND EXCEPTION 


Benign Contributory Page 
Exception Exception Fault 


Benign OK OK OK 
Exception 
FIRST Contributory OK DOUBLE OK 
EXCEPTION Exception 
OK 


Page DOUBLE DOUBLE 
Fault 
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Table 7-6 
Conditions that Cause an Invalid Task State Segment 





Error Code Condition 
TSS id + EXT The timit in the TSS descriptor is less than 103 
LTD id + EXT Invalid LOT selector or LOT not present 
SS id + EXT Stack segment selector is outside table limit 
SS id + EXT Stack segment is not a writable segment 
SS id + EXT Stack segment ODPL does not match new CPL 
SS id + EXT Stack segment selector RPL < > CPL 
CS id + EXT Code segment selector is outside table limit 
CS id + EXT Code segment selector does not refer to code segment 
CS id + EXT DPL of non-conforming code segment < > new CPL 
CS id + EXT DPL of conforming code segment > new CPL 
DS/ES/FS/GS id + EXT DS, ES, FS, or GS segment selector is outside table limits 
DS/ES/FS/GS id + EXT DS, ES, FS, or GS is not readable segment 


one ever remembers (i before e except after c, etc.). A double fault requires two Con- 
tributory exceptions, two page faults, or a contributory exception during the execution 
of the page fault handler. 

All double faults cause the processor to push an error code with value O onto the 
stack. The processor cannot restart the faulting instruction. If another exception occurs 
while the processor is trying to invoke the double fault handler, it simply shuts down 
completely. Only RESET or a nonmaskable interrupt can revive it at this point. 

Double fault handlers should be tasks, not procedures. The context in which the fault 
occurred may contain a wide variety of problems, such as a lack of stack space or an 
_ invalid segment selector. The safest course therefore is to switch tasks, thus guaran- 
teeing the handler a valid context. 


S 
NO 
NO 
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INVALID TASK STATE SEGMENT FAULTS 


Any conditionlistedin Table 7-6 can produce aninvalid task state segment. The proces- 
sor pushes an error code onto the stack to help identify the actual cause. The EXT bit 
indicates whether an outside condition such as an interrupt caused the exception. 

This exception occurs during a task switch. If the processor has not completely 
verified the new TSS, the context is still that of the old task. Otherwise, the context is 
that of the new task. 

One obvious problem here — you must ensure that the handler has a valid TSS. 
Otherwise, you will end up with a double fault. Thus the handler must be a task in- 
voked via a task gate. After all, you know that the current task state segment Is invalid. 


SEGMENT NOT-PRESENT EXCEPTIONS 


These exceptions occur when the processor tries to use a descriptor with a zero present 
bit. The program could be loading a data segment register. (Problems with the SS 
register cause stack exceptions). It could also be loading the LDT register with an LLDT 
instruction. Or it could be using a gate descriptor that 1s not present. 

This exception 1s clearly restartable. All that the exception handler must do in most 
cases 1s load the segment into memory and set the present bit in its descriptor. The in- 
terrupted program can then resume execution by restarting the faulting instruction. 

The most common use of this exception is to implement virtual memory at the seg- 
ment level. However, such implementations are uncommon because the variable size 
of segments makes them difficult to manage. Instead, most systems implement virtual 
memory at the page level. 

One problem here is that a not-present exception may occur during a task switch. 
Such a switch assigns new values to all segment registers and may make several of 
them invalid. For example, the new TSS could have been overwritten or its address 
could have been specified improperly. Therefore, the exception handler cannot 
reference memory via the current segment registers, since using them could cause 
another exception. The result would be a double fault (see Tables 7-4 and 7-5). 

How can we escape from this quandary? Either of the following approaches will 
work: 

¢ Implement the not-present fault handler as a task gate. The processor will 
load new segment registers from the handler’s TSS. However, this ap- 
proach may make it difficult to determine why the original fault occurred, 
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since only indirect access to the suspended routine’s state is possible. The 
handler must use the back link in its TSS to gain access. 

« Check all segment register images in the TSS, simulating the usual 
processor test. You can also force tests by PUSHing and POPping all 
segment registers. This approach involves some work but allows the ex- 
ception handler to run in the interrupted task’s context. 

Segment not-present exceptions may have special significance if theircause 1s anot- 
present bit in a gate descriptor. Systems software may use that indicator for special 
functions such as network transfers or the emulation of features that are under develop- 
ment. 





STACK EXCEPTIONS 


Two general conditions cause stack exceptions: 

¢ Limit violations in operations that use the SS register. Such operations 
include POP, PUSH, ENTER, and LEAVE, as well as instructions that 
use SS implicitly (by addressing through ESP or EBP) or explicitly 
(through a segment override). Typical examples of such instructions are 
MOV ECX,[EBP] and MOV EAX,SS:[ESI+7]. 

¢ Anattemptto load SS with anot-present descriptor. Note that this results 
in a Stack exception, not a segment not-present exception. 

Stack exceptions cause the processor to push an error code onto the exception 
handler’s stack. The code is usually zero. It contains a selector to a segment if a not- 
present stack segment caused the exception or if an interlevel CALL caused the new 
stack to overflow. 

Stack exceptions are always restartable. Note that, in the case of a not-present 
descriptor, the instruction to be restarted is the first instruction of the new task. 

The same problem with regard to task switches applies here as with segment not- 
present exceptions. Other segment registers may not contain valid values. The solu- 
tions described in the previous section apply here also. 





GENERAL PROTECTION EXCEPTIONS 





This category 1s the grab bag for all protection violations that don’t fit anywhere else. 
Among the possibilities are: 
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¢ Exceeding the limits of segments or descriptor tables. Note, however, 
that limit violations involving the SS register cause stack exceptions. 

¢ Violating the accessibility of a segment. This sounds either dangerous or 
disgusting. Typical examples are transferring control to a segment that 
is not executable, writing into one that is executable or read-only, read- 
ing from one that is executable, and loading a data segment register with 
a descriptor for a system or unreadable segment. 

« Trying to use a null selector. 

¢ Switching to a busy task. 

¢ Violating privilege rules. Also trying to exit V86 mode via a trap or in- 
terrupt gate to a nonzero privilege level. 

¢ Trying to enter an impossible state, such as one that has paging enabled 
(PG = 1) without protection (PE = 0). 

In response to a general protection exception, the processor pushes an error code 
onto the handler’s stack. The error code is zero unless loading a descriptor caused the 
exception. In that case, the error code contains a selector to the descriptor. This selec- 
tor may refer to an instruction operand, a gate, or a task state segment. 

You can avoid many protection exceptions or clarify their causes by testing segment 
selectors ahead of time. The VERR (verify for reading), VERW (verify for writing), 
LAR (load access rights), and LSL (load segment limit) instructions are useful for this 
purpose. VERR, VERW, LAR, and LSL are not privileged. 


PAGE FAULTS 





This exception usually means that an accessed page is not currently in memory and 
must be loaded from disk. It is the key to demand-paged virtual memory systems. 
Another possibility is that the page is present but the procedure is not privileged enough 
to access It. 

Besides saving a special error code (see Figure 7-5) in the stack, the processor also 
saves the linear address that caused the exception. This address ends up in control 
register 2. The exception handler needs it to locate the page directory and page table 
entries. 

One problem is that page faults may occur during task switches. These are difficult 
to handle because they may occur before or after the change of context or before the 
new context has been verified. The only way to ensure a valid context for the page fault 
handler is to invoke it via a task gate. 
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Another problem is that a page fault may occur with an invalid stack pointer. The 
usual cause is a page fault in the middle of the 8086 sequence: 
MOV SS,AX 
MOV SP,STACKTOP 
The result is a mixture of new stack segment and old stack pointer. The solution is to 
use the LSS (load full stack pointer) instruction instead of the 8086 sequence. LSS 
loads the stack segment and stack pointer registers as a unit. | 
Page faults may have special meanings to operating systems. For example, an OS 
may use them to load programs, regain control, extend the stack, or copy data pages 
for new programs. 


DEBUGGING FEATURES 


One major use of exceptions is in debugging. INT 3 (the 1-byte instruction breakpoint) 
is particularly valuable here. Debuggers can easily replace the first byte of an instruc- 
tion with INT 3, thus causing a trap back to a monitor or other systems software. This 
kind of breakpoint has been common in processors for many years. 

What are its limitations? First, it can produce only instruction breakpoints. You can- 
not usc it to cause breakpoints on the reading or writing of specific data addresses. 
Second, replacement is impossible when an instruction is in ROM and often difficult 
when it is in the opcrating system. Besides, the debugger must physically replace an 
instruction, thus changing the underlying program. It must also restore the instruction 
later. One common problem in debugging is leftover breakpoints resulting from 
runaway programs or improper termination of the debugger. 

The 80386 provides special hardware debugging features to overcome these limita- 
tions. They are: 

¢ A reserved debug interrupt vector (exception 1). 

- Four debug address registers that programmers can use to monitor ad- 
dresses. No instruction replacement is necessary, so these addresses can 
be anywhere. They can be in either data memory or program memory. 

¢ A debug control register that programmers can use to specif y debug con- 
ditions. 

- A debug status register that helps identify the cause of a debug excep- 
tion. 
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| RESERVED | DRS 


| RESERVED | DR4 

BREAKPOINT 3 LINEAR ADDRESS DR3 

| BREAKPOINT 2 LINEAR ADDRESS DR2 
i I i 

| BREAKPOINT 1 LINEAR ADDRESS | DR1 


| BREAKPOINT O LINEAR ADDRESS | DRO 


NO*T®: 0 MEANS INTEL iNTEL RESERVED. DO NOT DEFINE. 


Figure 7-6 
80386 debug registers. 


¢ The trap (T) bit of a task state segment to allow monitoring of task 
switches. T is bit O of the highest-addressed double word in the standard 
TSS (see Figure 6-1). 

¢« The resume flag (RF) in the flags register. It allows instruction restart 
after a debug exception for testing purposes. RF is bit 16 of the extended 
flags (see Figure 2-3). 

¢ The single-step (trap) flag (TF) inthe flags register which forces a debug 
exception after every instruction. TF is bit 8 of the extended flags (see 
Figure 2-3). 

These features make it easy to set breakpoints on such conditions as the following: 
¢ Task switch to a specific task 
¢ Instruction execution or data read or write at a specified address 


DEBUG REGISTERS 


The 80386 has eight debug registers, as shown in Figure 7-6. You can access most of 
them through special MOV instructions that must execute at privilege level O (the most 
privileged level). MOV allows only 32-bit transfers between a debug register and a 
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general-purpose register. The operand-size attribute does not apply. Note that Intel has 
reserved debug registers 4 and 5 for its own purposes. Programmers cannot access 
them. 

DRO-DR3, the debug address registers, contain linear addresses used in breakpoint 
conditions. Note that these are linear addresses, not physical addresses. 

DR7, the debug control register (see the details in Figure 7-6), defines the debug 
conditions. Its upper word contains read/write and length. fields for each address 
register. The read/write (R/W) fields are coded as follows: 

OO — break on instruction execution only 

O1 — break on data writes only 

10 — undefined 

1 1 — break on data accesses (reads or writes) but not on instruction 

fetches 

The length (LEN) fields have has the following meanings: 

OO — 1-byte length 

O1 — 2-byte length 

10 — undefined 

11 — 4-byte length 
This information applies only to data transfers. The field should be OO for instruction 
fetches; all other values are undefined. 

The lower word of DR7 selectively enables the four address breakpoint conditions. 
Local (L) enables refer only to the current task; the processor resets them at every task 
switch. Global (G) enables refer to all tasks; task switches do not affect them. The LE 
and GE bits cause the reporting of a data breakpoint on the instruction that causes it. 
One of thesc bits should be set to use data breakpoints. 

DR6, the debug status register (see the details in Figure 7-6), contains the follow- 
ing bits: 

BO through B3 indicate whether a debug exception has occurred under the condi- 
tions for a specific address register. These bits are set if an exception has occurred, 
regardless of whether it was enabled with aG or an L bit. 

BT is set if a task switch has occurred and the T (trap) bit of the new TSS 1s set. 

BS (great name!) is set if a single-step exception has occurred. This 1s the highest- 
priority debug exception. . 

BD is set if the next instruction will read or write a debug register while Intel’s ICE- 
386 (a popular hardware debuggir #\device) is using the debug registers. 

Note that the processor sets the Bits in the status register but never clears them. The 
debug handler must clear them explicitly. 
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Flags to Test Condition 





BS=1 

BO=1 AND (GEO=1 OR LEO=1) 
B1=1 AND (GE1 =1 OR LE1=1) 
B2=1 AND (GE2=1 OR LE2=1) 
B3=1 AND (GE3=1 OR LE3=1) 


Single-step trap 

Breakpoint DRO, LENO, R/WO 
Breakpoint DR1, LEN1, R/W1 
Breakpoint DR2, LEN2, R/W2 
Breakpoint DR3, LENS, R/W3 


BDO=1 Debug registers not available; in use by !CE-386. 
BT =1 Task switch 
Figure 7-7 
80386 debug exception conditions. 
DEBUG EXCEPTIONS 


The 80386 reserves interrupt 1 for debug exceptions. Table 7-7 lists the possible causes. 
The debugger can differentiate among these by examining the debug control and status 
registers (DR7 and DR6, respectively). Instruction address breakpoints are faults, 
whereas other debug conditions are traps. 


SUMMARY 


Exceptions are internal conditions or instructions that cause the 80386 to suspend its 
nomnal activities and do special routines. The 80386 handles them just as it does ex- 
ternal interrupts. It transfers control to a new routine through an entry in the interrupt 
descriptor table (IDT). The entry may be an interrupt, trap, or task gate. Interrupt and 
trap gates differ only in their effect on the Interrupt Enable (1) flag; interrupt gates clear 
the flag, disabling interrupts, whereas trap gates leave it unchanged. A task gate 
provides a completely new context for the exception handler. 

Exceptions may be either processor detected or programmed. Programmed excep- 
tions are the so-called software interrupts INTO, INT 3, INT n, and BOUND. Proces- 
sor-detected exceptions include divide errors, debug exceptions, invalid opcodes, 
coprocessor errors, invalid task state segments, segment not-present errors, stack ex- 
ceptions, page faults, and general protection exceptions. 

Exceptions can be classified (in order of seriousness) as faults, traps, or aborts. The 
processor can restart a faulted instruction. It can go on to the next instruction after ser- 
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vicing a trap. It can only report the error in the case of an abort. Double faults occur 
when there is a serious exception during the execution of the handler for a serious ex- 
ception. If still another fault occurs during the execution of the double fault handler, 
the processor shuts down completely. 

Faults that occur during task switches cause special problems. The major difficulty 
is that the processor may not have checked the new task’s state completely. Thus more 
faults may occur if the processor tries to use segment registers and other resources, The 
usual solution is to implement the fault handler as a task with its own (known valid) 
context. 

The 80386 provides special debugging facilities as well as the usual traps, break- 
points, and single-step mode. These facilities allow the programmer to set breakpoints 
at any of four addresses under a variety of conditions. No slowdown occurs, and no in- 
structions need to be replaced. Breakpoints may occur on instruction fetches, data ac- 
cesses, or data writes. 
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Only a signal shown and a distant voice in the darkness. 
Longfellow, Tales of a Wayside Inn 


Handshakes can be faked and usually are, but smiles can’t. 
Rex Stout, Homicide Trinity 


This chapter covers the 80386’s hardware features. It describes the processor’s signal 
structure, bus operations, memory interface, and numeric coprocessors. A final section 
discusses cache memory. 


NEW 80386 FEATURES 


The 80386’s new hardware features are: 
¢ Full 32-bit address and data buses with automatic handling of byte, word, 
and misaligned transfers. Intel calls this feature dynamic data bus sizing. 
¢ Address pipelining that allows the overlapping of successive memory 
cycles. It gives extra time for a memory access without reducing overall 
system performance. 
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¢ Provisions for both 32-bit and 16-bit data buses 
¢ Support for a 32-bit numerical coprocessor, the 80387 (and for the 32- 
bit Weitek 1167 floating point chip set). 

The 80386’s high clock speeds (16 MHz and up) mean that it needs fast memory to 
Operate without delays. However, a large amount of such memory is expensive. Ways 
to reduce costs without sacrificing performance include: 

¢ Overlapping bus cycles so that the next memory or I/O access can begin 
before the current one ends. 

¢ Dividing memory into banks (units with their own control circuitry) so 
that accesses to successive addresses can occur without delays. A single 
bank may require waiting time between accesses. 

¢ Saving frequently used instructions and data in a small amount of high- 
speed memory (called a cache). Cache memory has the same relation- 
ship to main memory that main memory has to disk storage. 

All three techniques have been used previously in the design of larger computers. 


80386 EXTERNAL SIGNALS 


Table 8-1 lists the 80386’s signal pins, along with their functions and characteristics. 
A # symbol after a signal’s name indicates that it is active low (0 is the named or ac- 
tive state; 1 is the opposite or inactive state). The terms commonly used to identify 
States are asserted (that is, in the active state) and negated (that is, in the inactive state). 
If the signal has two names, the one followed by # is the low (Q) state. We may group 
the status and control signals (not including the data bus, address bus, and clock) into 
the following categories: 

¢- Memory and I/O transfer (handshake) 

¢ Startup 

¢ Coprocessor 

e Interrupt 

¢ DMA 


Memory and I/O Transfer Control Signals 


The memory and I/O transfer control signals are described in the following paragraphs. 
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Table 8-1 
Summary of 80386 signal pins 





input 
Active Input/ | Synch or 
State Output Asynch 
to CLK2 


ee 


ADS# Address Status Yes 
NA# Next Address Request | wd 
READY# Transfer Acknowledge — 


HLDA Bus Hold Acknowledge wign | oo | No 


enmons | coprovessortror | wow | 1 | a | - 
INTR Maskabie interrupt Request High for fA _— 


NMI Non-Maskable Intrpt Request High fo — 


| 
RESET - Reset High | S = 


Output 
High impedance 
During HLDA? 


Signe! Name Signal Function 
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031-024. SY MEMORY CS 
me 023-016 : MEMORY CS 
80386 
ita MEMORY CS [0 





Figure 8-1 
Using the byte enable (BE) signals to select 8-bit banks of memory. 


Table 8-2 
Possible Data Transfers on the 32-Bit Data Bus 





Possible Data Transfers to 32-Bit Memory 


Size Byte Enables 
32 bits 3-2-1-0 
24 bits 3-2-1 

2-1-0 

16 bits 3-2 
2-1 

1-0 

8 bits 3 
2 

1 

0 
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BYTE WORD DWORD 
ADDRESS ADDRESS ADDRESS 





31 24 | 23 16|15 8|7 0 


Figure 8-2 
Address, data bus, and byte enables for 32-bit bus. 


BEO# through BE3# are enable signals that select which bytes the processor will 
transfer during an I/O or memory cycle. Computer designs generally use the BE sig- 
nals to select 8-bit units (banks) of memory as shown in Figure 8-1. BEO# enables 
transfers over data bus lines 0 through 7 (DO through D7), BE]1# over D8 through D15, 
BE2# over D16 through D23, and BE3# over D24 through D31. Figure 8-2 shows the 
correspondences among byte enables, byte addresses, word addresses, and double word 
addresses, Assuming the design in Figure 8-1, the processor can transfer a double word 
by asserting all four enables. In fact, it can transfer any contiguous set of bytes. All it 
must do is assert the enables given by an entry from Table 8-2. 

The processor can also use the BE signals to transfer misaligned words or double 
words. That is, those not starting at an address divisible by 2 or 4. Such transfers re- 
quire two memory cycles instead of one. Table 8-3 lists the contents of the address bus 
and the active byte enables during both cycles for all alignments. Figure 8-3 shows the 
cycles from the memory’s point of view. The example is a 32-bit transfer with an even 
but misaligned address (that is, an address divisible by 2 but not by 4). Note that the 
24-bit transfers listed in Table 8-2 occur only as part of the transfer of a misaligned 
double word. There are no 24-bit instructions. 

Although byte enable signals are convenient for local memory control, they may not 
be adequate for multiprocessor or external bus-based systems. Standard buses such as 
Multibus J, Multibus II, and VME generally require explicit AO and A1 signals. Extra 
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Table 8-3 
Misaligned Data Transfers on a 32-Bit Bus 


First Cycle: Second Cycle: 
Transfer Physical Address | Byte Address Byte 
Type Address Bus | Enables Bus Enables 
Word 4N+ 3 4N + 4 0 4N 3 
Doubleword 4N + 1 4N + 4 0 4N 1-3 
Doubleword 4N + 2 4N + 4 0-1 4N | 2-3 
Doubleword 4N+ 3 4N + 4 0-2 | 4N 3 


NOTE: 4N=Nth doubleword address 


gates are necessary to produce them. For descriptions of these buses, see the documen- 
tation from Intel and the Multibus Manufacturers’ Group for Multibus and from 
Motorola and the VME Users’ Group for the VME bus. 

W/R# is a write-read indicator. It differentiates between input (QO) and output (1) 
cycles. 

D/C# is a data-control indicator. It differentiates between cycles used for control 
purposes (QO) and those used to transfer data (1). Memory and I/O devices may use the 
bus only during data cycles. Other devices may use the bus during control cycles. For 
example, interrupt acknowledge cycles send D/C# low. The result is to activate neither 
memory nor I/O. The vector source can then put the interrupt type on the data bus 
without having to contend with memories or input ports. 

M/I O# is a memory-I/O indicator. It differentiates between cycles used to access 
memory (1) and those used to access I/O (O). This signal is the only difference between 
input/output (IN, OUT) and memory transfer (MOV) instructions. As far as the 80386 
is concemed, an I/O device is anything activated by this signal being low. A memory 
is anything activated by it being high. All other distinctions (size, function, speed, etc.) 
are beyond the 80386’s comprehension. Note that it is thus perfectly acceptable for the 
80386 to access I/O devices through memory addresses (called memory-mapped I/O) 
Or memory through I/O addresses (perhaps a local buffer for an I/O device). 
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FIRST BUS CYCLE: A31-A2=n+4 
32-BiIT MEMORY 
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Figure 8-3 
A 32-bit misaligned data transfer. 


LOCK#, the bus lock indicator, is used to identify special cycles in multiprocessor 
systems. This signal generally tells other processors that they may not take control of 
the address and data buses. The usual reason is that the processor is doing an indivisible 
Operation such as updating a selector and an offset. If another processor took control 
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Figure 8-4 
Typical RESET timing circuit. 


during the operation, it might find only one of the address’ two parts updated. A proces- 
sor may also assert LOCK# while changing shared memory, control flags 
(semaphores), or interface parameters. Note that it may be necessary to lock a read- 
modif y-write sequence in which a processor examines and updates a shared location. 

ADS#, the address status signal, indicates that the address bus outputs are valid and 
can be decoded. ADS# marks the beginning of each memory or I/O cycle. 

NA#, the next address request signal, is an input indicating that the memory has 
finished with the last address and can accept another. This signal is the key to pipelin- 
ing address cycles, that is, starting the next cycle before its predecessor ends. The result 
is higher throughput (because of the overlap) at the cost of some circuit complexity. 
Note that memory or I/O boards must produce NA# to allow address pipelining. 

BS16# is a signal indicating whether the 80386 Is using a 32-bit data bus (1) ora 
16-bit bus (QO). A rather disappointing explanation for a suggestive name. When BS16# 
is asserted, data transfers occur over the lower half of the data bus. Transfers of more 
than 16 bits take two cycles. 

BS16# is an input signal that can help connect the 32-bit processor to a 16-bit 
memory section. Presumably, this makes sense only as a stopgap measure for 16-bit 
systems. Connecting 32-bit processors to 16-bit buses increases throughput over strict- 
ly 16-bit systems. Forexample, many companies make add-on boards that put an 80386 
CPU in an 80286-based computer such as an IBM PC AT. Of course, the overall sys- 
tem data bus remains 16 bits wide. Ultimately, however, the situation is like having a 
brand new 8-lane bridge with 4-lane connector roads at both ends. 

READY# is an acknowledgment from the memory or I/O section, indicating the 
successful completion of a transfer. Circuitry on a memory or I/O board can extend 
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EDX REGISTER 





Figure 8-5 
Initial contents of control register O after RESET. 


READY# by extra clock periods to wait for slow devices. The common use Is for slow 
memory. That memory could be cheaper, use less power, resist radiation damage, retain 
its contents indefinitely, or have other special properties. READY# can also be used 
to interface fast I/O devices such as disks and image processing boards that run at close 
to CPU speeds (say, within a factor of 10 of the CPU clock rate). 


Startup Signals 


The main startup signal is RESET. Figure 8-4 shows a typical circuit to derive it from 
a panel switch. The 82384 device is a clock generator; it synchronizes RESET with the 
system clock. RESET has the following effects: 
¢ It puts the 80386 in real mode. 
¢ It gives registers and flags the initial values listed in Table 8-4. 
« It sets control register O as shown in Figure 8-5. The ET (extension type, 
not extraterrestrial) bit indicates whether the system has an 80387 
coprocessor (1) or not (Q). 
¢ It puts the output pins in the states listed in Table 8-5. Note that the ad- 
dress lines are all high and the byte enables are all active. 

Note that, after RESET, the processor automatically brings address lines A20 
through A31 high during instruction fetches. Thus instruction execution begins at 
physical address FFFFFFFOH, not FFFOH, as the values of CS and IP alone would sug- 
gest. The first far Gntersegment) JMP or CALL brings address lines A20 through A31 
low so that the processor continues executing instructions in the bottom 1 Mb of physi- 
cal memory. RESET is usually applied separately to other system devices, such as 
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Table 8-4 
Processor Registers and Flags After Reset 





Register Initial Value 
CRO See Figure 8-5 
CS selector 0 

DS selector 0 

DX Device ID 


(Figure 8-6) 


EAX Depends on self-test 
if requested 
(O means 
“passed”’). 

EFLAGS 2 (Parity flag = 1) 

EIP FFFOH 

ES 0 

FS 0 

IDTR base 0 

IDTR limit 3FFH 

SS 0 

All otier registers Undefined 


CONTROL REGISTER ZERO 


Perce rene = 


0 -— PAGING DISABLED 

* — INDICATES PRESENCE OF 80387 

0 — NOTASK SWITCH ————— 

0 — DONOT MONITOR Se eae 

0 — COPROCESSOR NOT PRESENT ——\—_———__—__—_ 
0 — PROTECTION NOT ENABLED (REAL ADDRESS MODE 


Figure 8-6 
Contents of EDX register after RESET. 
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Table 8-5 
80386 Output Pin States During RESET 








Pin Name Pin State 
LOCK#, D/C#, ADS#, A31-A2 High 
W/R#, M/lO#, HLDA, BE3#-BE0# Low 
D31-D0 Three-State 


coprocessors (80287 or 80387), parallel interfaces (8255), and DMA controllers 
(82258 or 82380). 


Coprocessor Signals 


The coprocessor signals are: 

¢ BUSY# is a Status signal from the coprocessor indicating that it is not 
done with its current operation. An active BUSY# means that the 
coprocessor cannot accept a new instruction (the 80387 allows overlap 
in Some Cases). 

¢ ERROR# is a Status signal from the coprocessor indicating that its latest 
Operation produced an error. Typical causes are an invalid operation, 
overflow, a zero divisor, underflow, a denormalized operand (outside the 
device’s range), or an inexact result. An invalid operation could be stack 
overflow or underflow or the encountering of an indefinite form such as 
O/0. During initialization, ERROR# indicates whether the system has an 
80387 coprocessor. 

¢ PEREQ (coprocessor request) is a status signal from the coprocessor in- 
dicating that it is ready to transfer data. The coprocessor does not con- 
trol the address and data buses on its own. Instead, it depends on the 
80386 for all data transfers. 
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Interrupt Control Signals 


The interrupt inputs are: 
¢ INTR, the maskable interrupt used for I/O and other normal system func- 
tions. The input is level sensitive. 
¢ NMI, the nonmaskable interrupt used for catastrophic events such as 
power failure or bus parity errors. The input 1s edge sensitive so that it 
will not interrupt its own Service routine. 

There is no interrupt acknowledge output. Instead, the processor responds to INTR 
interrupts with special interrupt acknowledge cycles. We will discuss them when we 
describe bus operations. NMI interrupts have their own fixed vector (#2; see Table 
4-3), so they do not need acknowledge cycles. 


DMA Signals 


The DMA signals are: 
¢ A HOLD input that tells the 80386 to relinquish its buses to the extemal 
controller (bus master). 
¢ An HLDA (hold acknowledge) output that informs the external bus 
master that it may take control of the bus. 

A DMA controller (usually a single chip with some peripheral circuitry) must 
manage the DMA system. It must provide handshaking and control the activation and 
prioritization of individual DMA channels. The requestor must keep HOLD active as 
long as it needs the bus. It must not take control of the bus before receiving the HLDA 
acknowledgment from the processor. 


80386 BUS OPERATION 


Figure 8-7 contains timing diagrams fornonpipelined read cycles. The cycle at the left 
operates at full speed, whereas the one at the right includes an extra wait state. Figure 
8-8 contains similar diagrams for write cycles. A cycle consists of at least two bus 
States, designated as Tl and T2. Each bus state in turn consists of two CLK2 cycles 
(CLK2 runs at twice the internal processor clock frequency). Some diagrams desig- 
nate the CLK2 cycles as @1 and @2, respectively. 

Nonpipelined 80386 bus cycles work as follows: 
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Figure 8-7 
Non-pipelined address read cycles. 
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Non-pipelined address write cycles. 
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Table 8-6 
Bus Cycle Definitions 
M/lO# D/C# W/R# Bus Cycle Type Locked? 





High Low High HALT: SHUTDOWN: No 
Address = 2 Address = 0 
(BEO# High (BEO# Low 
BE1# High BE1# High 
BE2# Low BE2# High 
BE3# High BE3# High 
A2-A31 Low) A2-A31 Low) 
High | High Low MEMORY DATA READ some Cycles 
High | High High MEMORY DATA WRITE some Cycles 


1. The processor starts the cycle (in state T1) by bringing ADS# (address status) low. 
ADS# indicates that the address bus’ contents are valid ana can be decoded and 
latched. 

2. The processor brings the memory control signals to the states appropriate for the 
current cycle. These signals include the byte enables (BE3# through BEO#) and the 
bus status outputs (M/IO#, D/C#, W/R#, and LOCK#). Table 8-6 defines different 
types of bus cycles in terms of these signals. Obviously, memory read cycles are by 
far the most common type, as all instruction fetches fall in this category. 
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Following any idle bus state (TI), addresses are rion-pipelined. Within non-pipelined bus cycles, NA# is only sampled during wait states. 


Therefore, to begin address pipelining during a group of non-pipelined bus cycles requires a non-pipelined cycle with at least one wait state 
(Cycle 2 above). 


Figure 8-9 
Pipelined address cycles. 


3. At the end of T2, the processor samples READY*#. If it is active low), the proces- 
sor reads the input data in a read cycle or terminates a write cycle. If READY# is 
inactive, the processor waits for another clock cycle (designated as an extra T2 in 
the extended cycles of Figures 8-7 and 8-8) before sampling it again. 
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The extension process continues as long as READY# remains inactive. This creates 
an obvious problem if a programming error makes the processor access a nonexistent 
address. The philosophical rule (very Zen-like) is that nonexistent memory is never 
ready. The usual solution is to have a timing circuit (called a watchdog timer) that is 
activated during each bus cycle. If the cycle does not terminate after a reasonable num- 
ber of clock cycles, the timer asserts READY# and causes an interrupt. 

Note that ADS# can be deactivated at the end of T1, whereas the bus control sig- 
nals remain valid until close to the end of the entire cycle. In write cycles, output data 
becomes valid on the data bus at the start of phase 2 in T1. 


Pipelined Bus Cycles 


Pipelined bus cycles allow overlapped signals as shown in Figure 8-9. The second half 
of each cycle (designated T2P in Figure 8-9) is the time for both data transfers and es- 
tablishing address and control signals for the next cycle. 

The key new signal is NA# (next address). Memory units that allow address pipelin- 
ing must assert it. The procesor samples it at the beginning of phase 2 of each CLK 
cycle in which ADS is inactive. If NA# is active, the processor sends out the address, 
byte enables, and bus status signals for the next bus cycle. 

Note the following characteristics of pipelined systems: 

e The first bus cycle after an idle bus state is always nonpipelined. 

¢ The bus cycle in which NA# is first recognized must be extended by at 
least one CLK cycle to allow the output of address and status before its 
end. There is thus some initial overhead that occurs after any idle state. 


Interrupt Acknowledge Cycles 


The 80386 does special bus cycles in response to an INTR interrupt. They serve as an 
acknowledgment to external devices such as an 8259 interrupt controller. At this time, 
the controller can put a vector (the interrupt type) on the data bus for the CPU to read. 
Figure 8-10 contains timing diagrams for the interrupt acknowledge cycles. There are 
two of them, separated by a gap. M/IO#, D/C#, and W/R# are all low during the cycles. 
External circuitry such as programmable array logic usually decodes this state to form 
an interrupt acknowledge output. 
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interrupt Vector (0-255) is read on DO-D7 at end of second Interrupt Acknowledge bus cycle. 
Because each Interrupt Acknowledge bus cycle is followed by idie bus states, asserting NA# has no practical effect. Choose the approach 
which is simplest for your system hardware design. 


Figure 8-10 
Interrupt acknowledge bus cycles. 


Note that the interrupt control circuitry must ensure a proper response to the acknow- 
ledge cycles. The 80386 does not supply any way to distinguish them. Instead, it al- 
ways provides the same address: 

BEO# low 
BE1]#, BE2#, and BE3# high 


Table 8-7 
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80386 Performance with Wait States and Pipelining 





Wait States Wait States Performance Retative Bus 
When Address When Address to Non-Pipelined Utilization 

is Pipelined is Not Pipelined O Wait-State 

0 0 1.00 13% 

0 1 0.91 19% 

1 1 0.81 86% 

1 2 0.76 89% 

2 2 0.66 91% 

2 3 0.63 92% 

3 3 0.57 


93% 


Address 4 (A2 = 1, everything else is QO) during the first cycle, address O 
during the second cycle (see Figure 8-10). 

The vector transfer always occurs on DO through D7 at the end of cycle 2, as shown 
in Figure 8-10. The processor automatically puts four idle bus states between the two 
cycles to synchronize with an 8259 priority interrupt controller. ADS# is active at the 
start of each interrupt acknowledge cycle, and the control circuitry must respond by 
asserting READY#. 


BUS PERFORMANCE CONSIDERATIONS 


The actual performance of 80386 bus cycles depends on the number of wait states re- 
quired and on whether the address is pipelined. Table 8-7 shows simulated results for 
different numbers of wait states and different pipelining conditions. Address pipelin- 
ing reduces the need for wait states. Because of the overlap, an access requiring two 
wait states without pipelining will require only one wait state with it. This can increase 
throughput significantly without requiring faster memory. 
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COPROCESSORS 


The 80386 can use either an 80287 or an 80387 numeric coprocessor. Both are software 
compatible with the popular 8087 coprocessor, often used with 8086 or 8088 CPUs. 
The 80386 can also use the high-speed Weitek WTL1167 floating point chip set. These 
devices perform numeric instructions in parallel with the 80386, specializing par- 
ticularly in 80-bit floating point arithmetic defined by IEEE Standard 754. The 80287 
is a 16-bit device, whercas the 80387 is a 32-bit device. The 80387 also runs at higher 
clock speeds than the 80287 and has extra trigonometric functions (sines and cosines). 

The key instruction for a coprocessor is ESC, designated by the binary pattern 11011 
in the five most significant bits. The 80386 sends ESC instructions on to the coproces- 
sorthroughI/O addresses 830000OF8H and 800000FCH. The execution does not depend 
on the 80386’s I/O privilege level. 

The coprocessor addresses are completely separated from conventional I/O addres- 
ses by having A31 high. Note that the 80386 sends address lines A16 through A31 low 
when accessing ports in its standard 64K I/O space. Of course, the I/O section must 
decode A31 to avoid conflict between coprocessor cycles and I/O cycles. 

The 80386 cannot transfer an instruction to the coprocessor until the BUS Y# input 
is inactive (high). The 80386 knows when to transfer data to or from the coprocessor 
because PEREQ (coprocessor request) goes high. 


80287 Numeric Coprocessor Interface 


Figure 8-11 shows a typical interface between an 80386 microprocessor and an 80287 
numeric coprocessor. The 80287 is selected when M/IO#1s low and A31 is high. That 
is, it is selected during I/O cycles only if A31 = 1. This works if all other I/O ports are 
specifically selected by A31 =O. Note that data transfers occur only over the lower half 
of the data bus, as the 80287 1s a 16-bit device. The 80287 requires its own clock gen- 
erator, as it runs much slower than an 80386 and does not use a double-frequency clock 
input. The difference in clock rates and timing also makes some latches necessary. 

In response to an ESC instruction, the 80386 does one or more I/O cycles to the 
80287’s ports. The 80386 automatically converts 32-bit memory transfers into 16-bit 
transfers, and vice versa. That 1s, it converts 32-bit transfers to the 80287 into two suc- 
cessive 16-bit transfers and combines 16-bit transfers from the 80287 into 32-bit 
memory transfers. This happens automatically — there is no need to activate the 
80386’s BS16# input. 
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80386 system with 80287 coprocessor. 


The 80287 uses its command inputs (CMDO and CMD1) to differentiate between 
data and commands. The interface in Figure 8-11 has the CMD lines connected to 
ground (CMD1) and to A2 (CMDO), respectively. The 80287 thus interprets outputs 
to address SOOOOOF8H (A2 = 0) as commands, and those sent to address 800000FCH 
(A2 = 1) as data. 
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80386 system with 80387 coprocessor. 
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80387 Numeric Coprocessor Interface 


Figure 8-12 shows a typical interface between an 80386 microprocessor and an 80387 
numeric coprocessor. Like the 80287, the 80387 is selected during I/O cycles when 
A31=1. 

The connections are mostly direct here, reflecting the fact that Intel designed the 
80387 specifically to work with the 80386. Of course, data transfers are a full 32 bits. 
The system must disable other data bus connections during 80387 transfers. As address 
line A2 is connected directly to CMD#, its value differentiates between data (1) and 
commands (0). 

The 80387 may have its own clock, which controls the numeric core. The bus inter- 
face unit uses the 80386’s clock. The two sections communicate through a FIFO buff- 
Bir. 

The only special problem in the interface is that read cycles (data transfers from the 
80387 to the 80386) require at least one wait state. Write cycles do not require this. 
The READY O# output of the 80387 generates the extra wait state automatically. 80387 
write cycles can overlap the execution of the previous instruction. 


Local Coprocessor Bus Cycles 


Coprocessors can interact with the 80386 in either of two ways: 
¢« The processor sends commands and data to the coprocessor as part of the 
execution of an ESC instruction. 
« The coprocessor requests data transfers using the PEREQ signal. 

Note that, inthe ESC case, the 80386 sets an internal memory address base register, 
memory address limit register, and direction flag. The coprocessor can then request 
operand transfers by activating PEREQ. This can happen only while the coprocessor 
is executing an instruction. 

Operand transfers may take a long time. For example, the operands may be 
misaligned, thus requiring extra cycles. Furthermore, operands may be too long for the 
80287’s 16-bit bus or even for the 80387’s 32-bit bus. In particular, IEEE 754 double- 
precision floating point numbers are 64 bits long (sign, 11-bit exponent, and 52-bit sig- 
nificand). 
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* jnitialization routine to detect an 80287 Numeric Processor 


initialize Numeric Processor 


FND 287: IT 


FNIN ; 
FSTSw 4X - retrieve 80287 status word 
OR AL,AL y Pest “low- GF yte SOUS Se reer trea ft aos 
; tf @ll.gzwero, then 60287 aresent end 
; properly initialized 
- if not all zero, then 80287 absent. 
JZ GOT 287 ; branch if, 81287 present 
SMSW AX ; tNto Bere ric Processor 
OR AX, 04H ; set EM bit in machine status word 
LMSW A X - to enable software emulation of 8028/7 
JMP CONTINUE 
GOT 287: SMSW A X >; Numeric Procesor present 
a OR AX, O2H ; set MP bit in machine status word 
LMSW AX ; to permit normal 80287 operation 
CONTINUE: ; and off we go. 


Figure 8-13 
Routine to detect the presence of an 80287 numeric coprocessor. 


80287/80387 Recognition 


The basic way for an 80386 processor to determnine whether an 80387 coprocessor is 
prcsentis to test the initial state of the ERROR# input. The 80387 makes ERROR# ac- 
tive after RESET. The processor checks it automatically at that time (and before ex- 
ecuting the first instruction). If ERROR# is active, the processor sets the ET bit (bit 4 
of control register QO). All the program must do is execute an FINIT instruction to reset 
the 80387’s ERROR# output. 

If ERROR# is inactive, the processor clears the ET bit. Note that ET = 1 means that 
an 80387 is present. ET = O means only that an 80387 is not present. There could be 
an 80287 in the circuit orno coprocessor at all. A routine like the one shown in Figure 
8-13 can tell these alternatives apart by reading the 80287’s status word. The result will 
be FFFF hex (all ones) if no 80287 is present, as the lines will be floating. 

If there 1s no 80287 available, the processor must set the Emulate Coprocessor (EM) 
bit (bit 2 of Control Register 0). The processor will then emulate coprocessor instruc- 
tions in software; this maintains compatibility between systems with coprocessors and 
those without them. Of course, systems that must emulate the coprocessor instructions 
will run much more slowly. If an 80287 is present, the processor must set the MP (Math 
Present) bit. MP is bit 1 of Control Register O (see Figure 2-4). 
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Coprocessor Exceptions 


There are three coprocessor exceptions: 

¢ Interrupt 7 — Coprocessor not available 

¢ Interrupt 9 — Conrocessor segment overrun 

¢ Interrupt 16 — Coprocessor error 
Interrupts 7 and 16 are benign exceptions (see Table 7-4) that software can handle. In- 
terrupt 9 is a contributory exception which should be avoided. Interrupt 13 (a con- 
tributory exception) can also occur if an operand lies outside the segment limit or 
violates some other prctection restriction. 

Two Situations can cause interrupt 7: 

« The processor tries to execute an ESC instruction when the EM bit of 
control register O is set. The exception handler must direct the processor 
to the software emulation of the instruction. 

« The processor tries to execute either a WAIT or an ESC instruction with 
both the MP (Math Present) and TS (task switched) bits set. This means 
that the proces7or has switched tasks since the last time it used the 
coprocessor. The exception handler must determine if the current task is 
the same as the one that was executing the last time the coprocessor was 
used (that is, the processor has returned to it after switching away). If not, 
the handler must save the coprocessor’s context in the previous task’s 
TSS. 

Interrupt9 occurs in protected mode if an operand of a coprocessor instruction wraps 
around an addressing limit or spans inaccessible addresses. Proper alignment of 
operands will eliminate this problem. 

Interrupt 16 occurs if the coprocessor detects an exception condition during instruc- 
tion execution. The 80386 recognizes the problem when it checks ERROR# at the 
beginning of a WAIT or certain ESC instructions. The handler must examine the 
coprocessor’s status register to determine the cause of the condition. 


MEMORY INTERFACING 


The basic interface for 80386 memory sections consists of the following: 
« Control signal generation circuitry to provide memory, I/O, interrupt, and 
other control signals. 
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Figure 8-14 
Basic memory interface block diagram. 


¢ Address latches to hold addresses beyond the time they are usually on 
the bus. Latches are essential in systems that use address pipelining. 

¢ Bus buffers to provide more drive current and better signal isolation. 

¢ Data bus control circuitry to prevent bus contention. Bus contention 
means that more than one device is trying to control the bus at a given 
time. It is usually the result of a device being relatively slow to get off 
the bus. 

« Address decoding circuitry to select memory banks or boards and I/O 
ports. 

Figure 8-14 shows the basic memory interface. The circuitry may consist of in- 
dividual TTL devices, programmable array logic (PALS), or PROMs. The bus control 
logicis usually a series of PALs as shownin Figure 8-15. The interface is asynchronous; 
that is, it depends on the exchange of status and control signals rather than on a clock. 
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Figure 8-15 


Bus control logic. 
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Figure 8-16 
Controller for a 3-CLK DRAM controller. 
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Figure 8-17 
Cache memory system. 


A major part of most memory interfaces is the dynamic RAM (DRAM) control cir- 
cuitry. Inexpensive DRAMs generally form the bulk of the memory section. Not only 
do they require refresh (the periodic rewriting of their contents), but they also need a 
brief idle period (precharge time) between accesses. This can slow the memory inter- 
face if the processor often accesses the same DRAM chips repeatedly, as it would in 
the usual arrangement when reading instructions from consecutive or slightly separated 
addresses in memory. 

One way to avoid the need for idle time is to direct Successive memory accesses to 
different banks. That 1s, the next consecutively addressed double word is always in a 
different bank of memory with its own control circuitry. While such an arrangement 
seems Odd, it makes no difference to the processor, any more than having successive 
Street addresses on opposite sides of the street affects mail delivery. We refer to the al- 
ternating arrangement as interleaved memory. Figure 8-16 shows a typical circuit in 
which a PAL selects DRAMs, manages refresh, and keeps track of which banks re- 
quire precharge time. 

Interleaved memory complicates system debugging. A failure in one memory bank 
causes a problem only with alternating double word addresses. This is clearly much 
more difficult to diagnose than is a problem that affects consecutive addresses. 
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CACHE MEMORY 





Cache memory consists of a small amount of fast memory that holds frequently ac- 
cessed data. Figure 8-17 shows the placement of a cache between main memory and 
the 80386 microprocessor. If an accessed location 1s in the cache, data moves quickly 
between it and the processor. Otherwise, data moves much more slowly between the 
processor and main memory. The new location is then moved to the cache, much as a 
page fault causes the loading of a new page from disk into main memory. 
Among the considerations in cache design are: 
- How do you maximize the number of times requested data is found in 
the cache? We refer to such an occasion as a Ait. The alternative is, of 
course, a miss. We call the percentage of hits the hit ratio. 
¢ How do you quickly determine whether a particular location is in the 
cache? 
¢ What is the optimum size of the units or blocks of memory in the cache? 
Obviously, most programs access sets of contiguous locations, and the 
processor might as well move them all to the cache at once. 
¢- How do you handle the changing (writing) of cache locations? New data 
written into the cache must eventually be written into main memory as 
well. Furthermore, external bus masters and DMA controllers must up- 
date the cache as well as main memory if they make changes. The 
problem is like deciding when to save a document from RAM to disk. 
Of course, caches do not produce something for nothing. There is always some 
reduction of performance, particularly during write cycles. Fortunately, writes are far 
less common than reads. Caches also require extra hardware for control and decoding. 


Cache Controller 


A cache controller must manage the cache memory system. This device does the fol- 
lowing tasks: 
¢ Jt determines whether a particular address is in the cache. It does this by 
using identification markers (tags) attached to each block. 
- It accesses the cache if the address is there (that is, if a hit occurs). 
- It moves a block of data from main memory to the cache if the address 
is not there (that is, if a miss occurs). 
- Itkeeps track of changes to the cache and to main memory that is cached. 
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LSI implementations of cache controllers are available. For example, Intel offers the 
82385 cache-memory controller that can handle up to 32K bytes of cache memory. It 
interfaces directly to the 80386 processor. Other cache controllers have on-chip 
memory. 


Block Size 


A block is the basic unit of memory that the cache controller moves from main memory 
into the cache at atime. When a needed word is not in the cache, the controller loads 
not only it but also the entire block containing it. Typical sizes for blocks are 4, 8, and 
16 bytes. 

Obviously, a larger block increases the hit rate if the processor is accessing consecu- 
tive addresses or repeating a short loop. On the other hand, larger blocks take longer 
to move, reduce the number of blocks that fit in a cache, and increase the likelihood of 
unneeded data being placed in the cache. After all, the processor does not access con- 
secutive addresses forever. There are branches, jumps, and widely separated accesses 
to consider. 


Cache Organization 


Common cache organizations are: 

¢ Fully associative, in whicheach data block has a tag that completely iden- 
tifies its contents. There are no restrictions on where addresses can go. 
Any memory block could end up in any cache block. To determine 
whether an address is in the cache, the controller simply compares it (or 
part of it) with each tag. 

e Direct mapped, in which each cache block can only hold one of a 
restricted set of memory blocks. The tag then identifies which memory 
block it actually holds. To determine whether an address is in the cache, 
the processor must only check whether its tag is in the position cor- 
responding to the block that could hold it. The address cannot be cached 
anywhere else, so no further comparisons are necessary. 
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Figure 8-18 
Fully associative cache organization. 


« Set associative, in which the cache consists of several direct mapped 
caches, one of which is selected on an associative basis. This organiza- 
tion is clearly a hybrid between the direct mapped cache and the fully as- 
sociative cache. 

The fully associative cache provides the highest hit ratios and the least traffic be- 
tween main and cache memory. After all, there are no restrictions on what addresses 
the cache can contain simultaneously. So there are no patterns of accesses that can 
cause special problems. Figure 8-18 shows a fully associative cache system with 512 
bytes of cache memory (a very small amount) and 16 Mb of main memory. The proces- 


NO 
O» 
NO 


80386 Hardware Features 


32:BIT 


PROCESSOR | CACHE/DRAM 
e— 64K CACHE = 16 BITS—>} 
a 16 MEGABYTE DRAM = 24 BITS ——> 













DATA INDEX TAG 
FFFC | 
FFF8 
0010 
INDEX TAG DATA 0000 FF 
FFFC 01 12345678 | ans 
FFF8 FF |. 11223344 ee 
0010 
0008 00 87654321 | , 
0004 01 11235813 12345678 FFFC 
0000 00 13579246 FEFS 
0010 
(14 BITS) keg BiTSe| e322 BITS o> nodes 01 
0008 
64KSRAM CACHE eq 11235813 0004 
0000 
FFFC 
FFF8 
0010 =|, 
000C ao 


87654321 0008 
0004 
13579246 0000 


je32 BITSY 


16 MEGABYTE DRAM 


Figure 8-19 
Direct mapped cache organization. 


sor may have to perform up to 128 22-bit comparisons to determine whether a location 
is in the cache. Clearly, this will be either quite time consuming (if slow memory is 
used) or expensive (if fast memory is used). Thus fully associative caches are usually 
impractical. New hardware such as content-addressable memories may change this 
Situation in the future. Note that a more realistically sized cache (say, 16 Kb) would 
require even more comparisons (4096). It would also require 11 Kb for tags. 





80386 Programming Guide 


32-BIT 


PROCESSOR | CACHE/DRAM 
ADDRESS SELECT 


be 16 MEGABYTE DRAM = 24 BITS 


DATA 






——« 24682468 
es 11223344 
TAG DATA 


12345678 
11223344 


1FF 24682468 


87654321 
11235813 
0000 000 13579246 


Ke 9BITSe] fe 32 BITS >| 


32KSRAM 











001 77777777 


jes aITS > +32 BITS» 


32K SRAM 





11235813 
T7777777 


rege fie 
64K CACHE 
| i 87654321 


13579246 


be. 32 BITS > 





















e— 2 x 32K SRAM = 15 BITS—» 


>| 


INDEX 


7FFC 
7FF8 


0010 
000C 
0008 
0004 
0000 


| 7FFC 


7FF8 


0010 
000C 
0008 
0004 


0000 


7FFC ) 


7FF8 


0010 
000C 
0008 
0004 
0000 


16 MEGABYTE DRAM 


Figure 8-20 
Two-way Set associative cache organization. 


Direct Mapped Cache 


In a direct mapped cache such as the one shown in Figure 8-19, only one comparison 
is necesssary. An address with a particular index (say, 3EF4 in the example shown) 
can only be at a particular address (3EF4) in the cache. All the cache controller must 
do is compare the cache location’s tag with the original address’ tag. If they are the 
Same, the address is in the cache. Here the tag is just bits 16 through 23 of the memory 


address. 


TAG 


00 
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The drawback to direct mapping is that it restricts the possible contents of the cache. 
Locations with the same index cannot be in the cache simultaneously. In the example 
in Figure 8-19, for instance, addresses 13EF4 and 23EF4 cannot be cached simul- 
taneously because they would occupy the same block. The result is fewer hits and more 
movement of data between cache and main memory. However, because of their speed 
and hardware simplicity, direct-mapped caches are the most common implementation. 

Note that direct-mapped caches also require less memory for tags, as the tags are 
shorter. In Figure 8-19, for example, the tags are only 8 bits long, whereas they are 22 
bits long in Figure 8-18. 


Set Associative Caches 


In the set associative cache, as illustrated in Figure 8-20, a particular main memory 
location could go into any one of a small number of blocks in the cache. Figure 8-20 
shows a situation in which there are two possible blocks. The controller must therefore 
make two comparisons to Gciermine whether an address is in the cache. This requires 
either extra time or more hardware, although the hardware requirements are far more 
manageable than in the fully associative case. On the other hand, the associativity 
reduces the likelihood of two accessed locations being uncacheable at the same time. 
Thus a set associative cache is intermediate in cost and performance between a fully 
associative cache and a direct-mapped cache. Of course, we can extend the two-way 
system shown in Figure 8-20 to a four-way system and so on. Each extension increases 
the hardware requirements and the difficulty of deciding where to put new blocks. 

Cache controllers can implement the associativity hardware in LSI form. For ex- 
ample, Intel’s 82385 device allows two-way Set associative organizations. Other con- 
trollers such as the NEC tPD43608R and the Austek Microsystems A38152 allow 
four-way systems. 


Cache Updating 


The updating system ensures that data in the main memory and data in the cache 
agree eventually. This avoids the problem of “stale” data as shown in Figure 8-21. Stale 
data is worse than stale bread; its shelf life is shorter and the odor is unbearable. Two 
common approaches to updating are: 
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PROCESSOR READS THE SAME 
LOCATION AS IN STEP 4. STALE 
DATA IS COPIED INTO CACHE. 
PROCESSOR GETS WRONG DATA. 






Figure 8-21 
Illustration of the stale data problem. 


¢ Write through in which the cache controller immediately writes the data 
into main memory. 

« Write-back in which the cache controller writes data into main memory 
only when removing a location that has been changed. There is no reason 
to change main memory each time the cache changes. 

Ina write-through system, the controller writes data to the mainmemory immediate- 
ly after writing it into the cache. This approach is simple and ensures that the contents 
of the main memory are always valid. On the other hand, the accesses of main memory 
take extra time and occupy the buses. One way to reduce time consumption is by buf- 
fering write-throughs. The processor can then begin a new cycle before the writing of 
the main memory is completed. The only problem comes when two consecutive cycles 
require main memory accesses either to write data or to handle a cache miss. Then the 
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processor must wait for the previous main memory cycle to end before starting the next 
one. Obviously, the buffering also increases circuit complexity and adds to the actual 
cycle time (because of the buffer delays). 

In awrite-back system, the tag field of each cache block contains an altered or “dirty” 
bit. This bit is set if new data has been written into the block. It therefore indicates that 
the cache contains different data from the main memory. 

The procedure in a write-back system is to check the altered bit first. If it is set, the 
controller writes the block to main memory before loading new data into the cache. 
The advantage of write-back 1s that it reduces the number of times main memory has 
to be written. Only locations that are changed and then removed from the cache need 
to be written back into main memory. On the other hand, main memory is not neces- 
sarily up to date. It may, in fact, contain data from sometime ago. Write-back also re- 
quires a more complex controller than does write-through. Of course, controller 
complexity is largely irrelevant for LSI implementations. 


Noncacheable Memory 


Some memory locations cannot be cached. These may include: 

¢ Memory-mapped I/O devices. The cache locations would not reflect ex- 
ternal changes. For example, imagine if a memory-mapped keyboard 
were cached. The processor would never see subsequent keystrokes, as 
it would read only the cached location. 

- Interrupt vectors. These are generally accessed so seldom that they are 
not worth caching anyway. 

¢« Memory shared by several processors. 

One way to differentiate between cacheable and noncacheable memory is by decod- 
ing some of the more significant address lines. For exarnple, in Figures 8-18 through 
8-20, address bits 24 through 31 are unassigned. These bits could be decoded to sclect 
either the 16 Mbofcached memory or othernoncacheable memory. A simple approach 
is to just use A31 for selection. It could choose between 2 Gb of cacheable memory 
and 2 Gb of noncacheable memory. 

Noncacheable memory is one way of allowing several processors to share memory. 
Cacheable shared memory can result in stale data, a problem we refer to as maintain- 
ing cache coherency. On the other hand, frequent accesses to noncacheable memory 
can slow the system significantly. A partial solution to the problem is to copy data to 
cacheable memory as long as only one processor is accessing it. Maintaining coheren- 
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cy is also a challenge for systems with DMA controllers. The Intel 82385 cache con- 
troller helps by detecting writes to cached memory and invalidating the locations. Intel 
refers to this activity as (so help me!) “‘snooping.”’ 


Cache Performance 


Table 8-8 contains performance data for various sizes of direct-mapped and Set associa- 
tive caches. As mentioned earlier, fully associative caches are currently impractical in 
most situations. Clearly the hit ratios and performance ratio increase with cache size. 
However, the improvement is marginal (about 3 percent in hit rate and about 0.7 per- 
cent 1:1 performance ratio) if the cache is 32K or above. The differences between direct 
and set associative caches are small for all sizes shown (32K, 64K, and 128K). The 
payoff would not justify the added hardware complexity. A more profitable approach 
is to increase the line or block size from 4 to 8 bytes. Comparing the performance ratios 
to the bottom two lines of the table shows that a 64K direct-mapped cache gives about 
95 percent of the performance of a system with all high-speed memory (static or 
SRAM). 


SUMMARY 


The 80386 has an asynchronous memory and I/O interface. That is, itexchanges status 
and control signals (called handshaking) rather than depending on a clock. The proces- 
sor asserts Address Status (ADS#) when a valid address is onthe bus. The memory or 
I/O section must respond by asserting READY# when either data is available for read- 
ing or a write has been completed. The READ/WRITE#, DATA/CONTROL+#, and 
MEMORY/IO# signals can be used for bus control, cycle identification, and decod- 
ing. The NEXT ADDRESS# signal indicates that the memory can accept another ad- 
dress; it allows overlapping of memory cycles, thus reducing the need for wait states. 

The 80386 provides a RESET input, two interrupt inputs (maskable and nonmask- 
able), DMA request (HOLD) and acknowledge (HLDA), and a multiprocessor LOCK 
signal. It acknowledges interrupts by performing special bus cycles to obtain an inter- 
rupt vector. 

The 80386 can use either an 80287 or an 80387 numeric coprocessor. The 80287 is 
a 16-bit device, whereas the 80387 is a 32-bit device. The processor sends commands 
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to the coprocessor through special I/O port addresses. The coprocessor requests data 
transfers by activating the processor’s PEREQ input. 
Among the ways to reduce memory cost in 80386-based systems while still retain- 
ing high performance are: 
e Interleave memory addresses so that successive accesses refer to dif- 
ferent banks. This avoids setup (precharge) time. 
- Keep frequently used data in a small amount of high-speed cache 
memory. A cache controller must determine whether a particular address 
is in the cache, access the cache, move data from main memory to the 
cache, and keep track of changes to the cache and main memory. 
The use of cache memory introduces several new problems, such as deciding how 


and when to write new data into memory and how to maintain coherency between cache 
and main memory. 
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INSTRUCTION SET 


This section describes the 80386 instruction set. A table 
lists all instructions along with instruction encoding 
diagrams and clock counts. Further details of the in- 
struction encoding are then provided in the following 
sections, which completely describe the encoding struc- 
ture and the definition of all fields occurring within 
803 86 instructions. 


80386 INSTRUCTION ENCODING Al 
CLOCK COUNT SUMMARY 


To calculate elapsed time for an instruction, multiply 
the instruction clock count, as listed in Table 8-1 below, 
by the processor clock period (e.g. 62.5 ns for an 80386- 
16 operating at 16 Miiz (32 MHz CLK2 signal)). 


For more detailed information on the encodings of in- 
structions refer to section 8.2 Instruction [:ncodings. 
Secuion 8.2 explains the general structure of instruction 
encodings, and defines exactly the encodings of all 
fields contained within the instruction. 


Instruction Clock Count Assumptions 


1. The instruction has been prefetched, decoded, and 1s 
ready for execution. 


2. Bus cycles do not require wait states. 


3. There are no local bus HOLD requests delaying 
processor access to the bus. 


4. Noexceptions are detected during instruction execu- 
ion. 


5. If an effective address is calculated, it does not use 
two general register components. One register, scal- 
ing and displacement can be used within the clock 
counts shown. However, if the effective address cal- 
culation uses two general register components, add 1 
clock to the clock count shown. 


Instruction Clock Count Notation 


1. If two clock counts are given, the smaller refers to a 
register operand and the larger refers to a memory 
Operand. 


2. n=number of times repeated. 


3. m = number of components in the next instruction 
executed, where the entire displacement (if any) 
counts as One component, the entire immediate data 
Gif any) counts as one component, and each of the 
other bytes of the instruction and prefix(es) each 
counl as One component. 


This appendix ts an excerpt from Section 8 of the 80386 Data Sheet.8. Reprinted courtesy of Intel Corporation, 


Santa Clara, California. 
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CLOCK COUNT | NOTES 
Real Real | 
INSTRUCTION FORMAT Address | Protected | Address | Protected 
| Modeor | ‘Virtual Mode or Virtual 
Virtual Address Virtual Addrese 
| 8086 | Mode 8086 Mode 
7 : | a5 Mode Mode | 
GENERAL DATA TRANSFER | 
MOV = Move: Oe UPeaee ae | 
Register to Register/Memory 1000100w 2/2 2/2 b Hy 





Hegister/ Memory to Register 2/4 2/4 b h 
Immediate to Register/Memory immediate data a2 4 2/2 b n 
Immediate to Register (short form) immediate data 2 2 

Memory to Accumulator (short form) full displacement 4 4 b h 


at Oe OA 








Accumulator tc Mamory (short form} full displacement 2 2 b | h 


















Register Memory to Segment Register 2/5 18/19 b h. i, j 
Segment Regster to Register/Memory 2/2 | 2/2 b h 
MOVSX = Move With Sign Extension 
Flegister From Register/Meamory | 00001111 | 1011111w | mod reg r/m | g/6 3/6 b h 
MOVZX = Move With Zero Extension 
Register From Register/Memcry 00001111 1017011w7 mod reg r/m 3/6 | 3/6 b h 
PUSH = Push: | 
Register/ Memory J ) > b | h 
‘Register (short form) 0 fag | 2 2 b h 
ene pear (ES, CS, SS or DS) a : b P 
Paper yseaglarits, CS. SS, OS, 9 9 b h 
Immediate immediate data 2 2 b h 
PUSHA = Push All | 01100000 | 18 18 b h 
POP = Pop 
Register/Memory 5 a b h 
Register (short form) 4 4 b h 
Soren. weasel (ES, CS, SS or OS) ; zi ' hui 
Seem agar (ES, CS, SS or OS : a, ; 1 | 
POPA = Pop All 24 24 b h 
ACHG = Exchange 
Hegster/Memory With Register | 100001 1w | mod reg e/m | a/9 3/5 5. fh 
Register With Accumulator (Short form) Clk Count 3 3 
IN = Input from: eigen 
Fixed Port Fo 010 | port number *26 12 6°/26°° m 
Variable Port 27 13 Lar m 
OUT = Output to: 
Fixed Port 1110011Ww port number 424 10 } 4°/24°° m 
Vaniabie Port 25 11 | s§*/25°° | m 
LEA = Load EA to Register | 10001101 hatha 2 2 

i CPL < IOPL ** # CPL > (OPL 


2/2 


INSTRUCTION 


SEGMENT CONTROL 


LDS = Load Pointer to DS 


LES = Load Pointer to ES 
LFS = Load Pointer to FS 
LGS = Load Pointer to GS 
LSS = Load Pointer to SS 
FLAG CONTROL 


CLC = Clear Carry Flag 

CLD = Clear Direction Flag 

CLI = Clear Interrupt Enabie Flag 
CLTS = Clear Task Switched Flag 
CMC = Complement Carry Flag 
LAHF = Load AH into Flag 

POPF = Pop Flags 

PUSHF = Push Flags 

SAHF = Store AH Into Flags 

STC = Set Carry Flag 

STD = Set Direction Flag 

STI = Sei 'nterrupt Enable Flag 


ARITHMETIC 
ADD = Add 


Register to Register 

Register to Memory 

Memory to Register 

Immediate to Register/Memory 
Immediate to Accumulator (short form} 
ADC = Add With Carry 

Register to Register 

Register to Memory 

Memory to Register 

immediate to Register/Memory 
lmmediate to Accumulator (short form) 


INC = Increment 


Register/Memory 
Register (short form) 
SUB = Subtract 


Register from Register 
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FORMAT 


| 11000101 | modreg- r/m| 


11000100 |modreg _¢/m| 


| 00001111 | 10110100 |-madreg — t/m 


| 00001111 | 1011010 | mod reg rim | 
00001111 10110010 | 





| 11111000 | 
| 11111100 | 
| 11111010 | 


00001111 | 00000110 








11110101 | 
10011111 


| 10011101 | 
| 10011110 | 
| 11111001 | 


000000dw 





| O000000w | modreg r/m 


0000001 





| 100000sw | mod000 t/m | immediate data 
| 0000010w | immediate data 


|000100d~ 


0001000w | modreg r/m 





100000Sw |mod010, r/m]| immediate cata 


0001010w | immediate data 


| 1111717141w | mod 0.00 r/m | 


01000 reg 
001010dw | modreg com | 
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CLOCK COUNT 


Protected | 
Virtual 
Address 
Mode 


22 


22 


Zz 


25 


22 
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ARITHMETIC (Continued) 

Register from Memory 

Memory from Register 

Immediate from Register/Memory 
Immediate from Accumulator (short form) 
SBB = Subtract with Borrow 

Register from Register 

Register from Memory 

Memory from Register 

Immediate from Register /Memory 
Immediate from Accumulator (short form) 
DEC = Decrement 

Register/Memory 

Register (short form) 

CMP = Compare 

Register with Register 

Memory with Register 

Register with Memory 

Immediate with Register/Memory 
Immediate with Accumulator (short form} 


NEG = Change Sign 


AAA = ASCII Adjust for Add 
AAS = ASCil Adjust for Subtract 
DAA = Decimal Adjust for Add 


DAS = Decimal Adjust for Subtract 
MUL = Multiply (unsigned) 


Accumulator with Register/Memory 
Muitiplier-Byte 
-Word 
-Doubleword 
IMUL = Integer Multiply (signed) 
Accumulator with Register/Memory 
Multipiier-Byte 
-Word 
-Doubleword 


Regster with Register /Memory 


-Word 
-Doubleword 
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FORMAT 


0010100w |modrag _tvm 


0010101w |mod reg rem 


100000sw imod101_ r/m| immediate data 


0 010110w immediate data 


000110dw [mod reg _r/m 


0001100w |modreg r/m 


| 









}0001101w jmod reg r/m 


100000sw mod011-— f/m immediate dala 


00011%410w immediate data 


| 


1111111wlreg001  r/m 


= 
Oo 
Oo 


reg 


001110dw |mod reg n/m 


0011100w 


mod reg rim 


0011101w |modreg- r/m 


mi} tmmediate dala 


100000sw modii11  r/ 


0011110w immediate data 





1111011wIimod011 = &r/m 





00110111) 


JO Due 1) 
00100111 


00101117 | 


1111011wWws]mod100 fr/im 





1111011W mod100 f/m 


00001111— 





10-100. 1 AS 





Register/Memory with Immediate to Register mod reg rim immediate data 


-Word 
-Doubleword 
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mod reg r/m| 


CLOCK COUNT 


Real 
Address 
Mode or 

Virtual 
8086 
Mode 


2/7 


2/7 


2/6 


2/5 


2/6 


9-14/12-17 
9-22/12-25 
0-938/12-41 


9-14/12-17 
9~22/12-25 
9-96/12-41 


9-22/12-25 


9-96/12-41 


9~22/12-25 
9~98/12-41 





Protected 
Virtual 
Address 
Mode 


2/7 


2/5 


2/6 


9-14/12-17 
9-22/12-25 
9-96/12-41 


9-14/12-17 
9-22/12-25 
9-96/12-41 


9-22/12-25 
9-38/12-41 


9~-22/12-25 
9-36/12-41 








NOTES 
Feal 
Address Protected 
Mode or Virtual 
Virtual Address 
8086 Mode 
Mode 
b h 
b h 
b h 
b h 
b h 
b h 
b n 
b a 
b hn 
b a 
b n 
b.d d,hn 
b.d dn 
bye d,h 
b,d d, fh 
b,d d,h 
b,d d,h 
b. d d,h 
b,d d,h 
b,d d,n 
b.d dh 
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ARITHMETIC (Continued) 
DIV = Divide (Unsigned) 


Accumulator by Register/Memory 


Divisor—Byte 
—Word 
—Doubjeword 


IDtV = Integer Divide (Signed) 
Accumulator By Register/Memory 


Divisor—Byte 
—Word 
—Double word 


AAD = ASCII Adjust for Divide 
AAM = ASCII Adjust for Multiply 


CBW = Convert Byte to Word 


Shift Rotate Instructions 


FORMAT 


1111011w |mod110 = f/m 


1111011w |mod111 = F/m 


11010101 | 00001010 | 











10 011000 | 
CWD = Convert Wordto Double Word; 10011001 


LOGIC 


Not Through Carry (ROL, ROR, SAL, SAR, SHL, and SHR) 


Register/Memory by 1 


Register/Memory by CL 


Register/Memory by Immediate Count | 1100000w immed 86-bit data 
1101000m 


Through Carry (ACL and ACR) 
Register/Memory by 1 


Register/Memory by CL 


SHLD = Shift Left Double 
Register/Memory by Immediate 
Register/Memory by CL. 


SHRD = Shift Right Double 


Register/Memory by Immediate 
Register/Memory by CL 


AND = And 


Register to Register 






1101000w |mod TTT _t/m 
1101001w |mod TTT _1/m 








110100 1w |mod TTT _t/m 


Register/Memory by immediate Count | 1100000w immed 8-dit data 


TTT Instruction 


000 ROL 
001 ROR 
010 RCL 
011 RCR 
100 SHL/SAL 
+01 SHR 
1a SAR 


| 00001111 | 10100100 mod reg r/mlimmed 8-bit data 
| 00001111 | 10100101 | mod reg r/m| 


| 00001111 | 10101100 |mod rag r/ mlimmed 8-bit data | 


00001111 /]10101101 


| 001000dw |mod reg r/m 














11010100 | 00001010 | 


LILO 


Address | Protected | Address | Protected 


Mode or 
Virtual 
8086 
Mode 


14/17 
22/25 
38/41 


19/22 
277/30 
43/46 


19 


3/7 


3/7 


3/7 


9/10 


9/10 


9/10 


3/7 


3/7 


3/7 


3/7 


CLOCK COUNT 








Virtual 
Address 
Mode 


14/17 
22/25 
38/41 


19/22 
27/30 
43/46 


19 


17 


3/7 


3/7 


3/7 


9/10 


9/10 


9/10 


3/7 


3/7 


3/7 


3/7 






Mode or 
Virtual 
8086 
Mode 


b.e 
b.e 
5.3 


b.e 
b.e 
b.e 












Virtual 


| Address 


Mode 


e,h 
eh 
e,h 


e,n 
e.n 
eh 
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LOGIC (Continued) 
Register to Memory 
Memory to Register 
Immediate to Register/Memory 


Immediate i» 4- __ymulator (Siort Foun) 


TEST = And Function to Flags, No Result 


Register/Memory and Register 


Immediate Data and Register/Memory 


immediate Data and Accumulator 
(Short Form) 


OR = Or 


Register to Register 
Register to Memory 
Memory to Register 
Immediate to Register/Memory 


Immediate to Accumutator (Short Form) 
XOR = Exciusive Or 
Register to Register 


Register to Memory 
Memory to Register 
Immediate to Register/Memory 


Immediate to Accumulator (Short Form) 


NOT = Invert Register/Memory 

STRING MANIPULATION 

CMPS = Compare Byte Word 

INS = Input Byte/Word from DX Port 

LODS = Load Byte/Word to AL/AX/EAX 

MOVS = Move Byte Word 

OUTS = Output Byte/Word to DX Port 

SCAS = Scan Byte Word 

STOS = Store Byte/Word from 
AL/AX/EX 

XLAT = Translate String 


REPEATED STRING MANIPULATION 
Repeated by Count in CX or ECX 
REPE CMPS = Compare String 

(Find Non-Match} 


if CPL <= IOPL 


FORMAT 


| 0010000w |mod reg rim| 





0010001 w |modreg r/m 


| 1000000 mod 100 r/m| immediate data 


, 9010010w | immediate data 


f 


1000010w |modreg ré/m 


117111011w |mod000 —r/my] immediate data 


1010100w 


immediate data 


000010dw |modreg rém 


~0000100w |modreg r/m 


0000101w |modreg r/m 


1000000w |mod001_ r/m| immediate data 


0000110w | immediate data 


001100dw |modreg r/m 


0011000w |mod reg r/m 


i 
i 


0011001w |mod reg r/m 


1000000WwW mod110 r/m] immediate data 


1 0011010w | immediate data 


111101170 modQ 10 r/m 


-1010011w 


|0110110w 
1010110w 


1010010Ww 





01101113 


1010111Ww 


1010101w 


LEE 


11010111 





11110011 | 1010011w 


°° If CPL > IOPL 
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Clk 
Count 
Virtual 

8086 
Mode 


T29 


128 











Real 


8088 
Mode 


2/5 


2/5 


2/7 


2/7 


2/6 


10 


15 


14 


5+9n 


CLOCK COUNT 


Virtual 


Addreaa 


Mode 


2/7 


2/5 


2/5 


2/7 


2/7 


2/6 


10 


9°/29°° 


5 


Z 


8*/28°° 


7 


5+9n 














Addreaa | Protected | Addreaa | Protected 
Mode or 
Virtual 


Virtual 
Addreas 
Mode 





CLOCK COUNT | 


NOTES 










































Notes: 











Real Real 
INSTRUCTION FORMAT Address| Protected Addreas | Protected 
Mode or Virtual Modeor| Virtual 
Virtual | Address Virtual | Address 
8086 Mode 8086 Mode 
Mode Mode 
REPEATED STRING MANIPULATION (Continued) 
REPNE CMPS = Compare String Clk Count 
| | | | Virtual 
REP INS = Input String | 11110010 | 0110110w | {27+6n 139+6n |7+6n°/27+6n**| b h,m 
REP LODS = Load String | 11110010 | 1010110w| 5+6n 5+6n b h 
REP MOVS = Move String | 14 4 4-016 | 1010010w | 7+4n 7+4n b h 
REP OUTS = Output String 41110010. O110111W. T26+5n 12+5n |6+5n°/26+5n°° b h,m 
REPE SCAS = Scan String 
(Find Non-AL/AX/EAX)} 171110011 | 1010111W] 5 + 8n 5+8n b n 
REPNE SCAS = Scan String | 
(Find AL/AX/EAX) | 11110010 | 1010111w| 5+ 8n 5+8n b h 
REP STOS = Store String 1010101wWw 5+5n 5+5n b h 
BiT MANIPULATION 
BSF = Scan Bit Forward | 00001111 | 10111100 |mod reg r/m| 10+d9n | 10+3n b h 
BSR = Scan Bit Reverse 00001111 | 10111101 10+9n | 10+3n b h 
BT = Teat Bit 
Register/Memory, Immediate 00001111 ])10111010 jmod100 _=  r/mijimmed 8-bit data} 3/6 3/6 b h 
Register/Memory, Register 00001111 ];10100011 3/12 3/12 b h 
BTC = Teat Bit and Complement 
Register/Memory, Imm@iate | 00001111 | 10111010 |mod 111 r/mlimmed 8-bit data| 6/8 6/8 b h 
Register/Memory, Register 00001111 110111011 |modreg —t/m 6/13 6/13 b h 
BTR = Teat Bit and Reset 
Register/Memory, Immediate 00001111 | 10111010 lmod 110 r/mlimmed 8-bit data| | 6/8 6/8 b n 
Register/Memory, Register 00001111);10110011 6/193 6/13 b h 
BTS = Teat Bit and Set 
Register/Memory, Immediate | 00001111/;10111010 |mod 101 r/miimmed 8-bit data 6/8 6/8 b h 
Register/Memory, Register 00001111 6/193 6/13 b h 
CONTROL TRANSFER 
CALL = Cail 
Direct Within Segment 11101000 | full displacement 7+mM 7+mM b f 
Register/Memory 
ne / 7+m/ 
Indirect Within Segment 1177171711 |mod010 = r/m fa eo ue h,r 
Direct Intersegment | 10011010 |unsigned tull offset, selector 17+m 34+m jkr 


+ Clock count shown applies if |/O permission allows |/O to the port in virtual 8086 mode. If 1/O bit map denies permission 
exception 13 fault occurs; refer to clock counts for INT 3 instruction. 


"$f CPL < IOPL 


** If CPL > IOPL 
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CLOCK COUNT | NOTES 


Real | 
Address | Protected | 








INSTRUCTION FORMAT Address | Protected 


Modeor| Virtual | Virtual 
Virtual Address Address 
8086 Mode Mode 
Mode 
CONTROL TRANSFER (Continued) 
Protected Mode Only (Direct Intersegment) 
Via Call Gate to Same Privilege Level 52+m hj. Ke 
Via Call Gate to Different Privilege Level, 

(No Parameters) 86+m h,j.K.r 
Via Call Gate to Different Privilege Level, 

(x Parameters) 94+4x+m h,j.kK.e 
From 286 Task to 286 TSS 273 h,).K.e 
From 286 Task to 386 TSS 298 h,), Ke 
From 286 Task to Virtual 8086 Task (386 TSS) 217 hy. 
From 386 Task to 286 FSS 273 hj, Ke 
From 386 Task to 386 TSS 300 h, jor 
From 386 Task to Virtual 8086 Task (986 TSS) 217 h,j. Kr 

Indirect intersegment 111114111 |mod011 r/m 22+m 38+m b H,j.Ke 
Protected Mode Only (Indirect Intersegment) 
Via Call Gate to Same Privilege Level 56+m hj. Kr 
Via Cail Gate to Different Privilege Level, 

(No Parameters) 90+m hj.kKr 
Via Call Gate to Different Privilege Level, 

(x Parameters) 98+4x+m Hike 
From 286 Task to 286 TSS 278 h,j,Koe 
From 286 Task to 386 [TSS 303 hike 
From 286 Task to Virtual 8086 Task (386 TSS) 221 hy}, Kr 
From 386 Task to 286 TSS 278 CO hj.ke 
From 386 Task to 386 TSS 30S | h.pKe 
From 386 Task to Virtual 8086 Task (386 TSS) 221 hj-k,e 


JMP = Unconditional Jump 


Short | 14101001 |a-bit displacement| 74+m 7+m r 
Direct within Segment full displacement 7+m 7+m | r 

. om : | 7+m/ 7+m/ 
Register/Memory Indirect within Segment Paaaatadd | mod 100 r/m 10+m 19% | b he 


Direct intersegment | 11101010 | unsigned full Offset, selector 12+m 27+m | J Kr 


Protected Mode Only (Direct Intersegment) 


Via Call Gate to Same Privilege Level 45+™m hj. 
From 286 Task to 286 TSS 274 h, | Ke 
From 286 Task to 386 TSS 301 h,j.Kor 
From 286 Task to Virtual 8086 Task (386 TSS) 218 HL kie 
From 386 Task to 286 TSS 270 Shi kae 
From 386 Task to 386 TSS 303 hike 


From 386 Task to Virtual 8086 Task (386 TSS) 220 h,j.kr 


Indirect Intersegment 11141919901 101 r/m| 17+m 31+m | b | ALK e 


Protected Moce Only (Indirect Intersegment) 





Via Call Gate to Same Privilege Level 49+m h,). Kr 
From 286 Task to 286 TSS 279 h, jkr 
From 286 Task to 386 TSS 306 h,j.kK.r 
From 286 Task to Virtual 8086 Task (386 ¥SS) 222 1 hjke 
From 386 Task to 286 TSS 275 h, jk, 
From 386 Task to 386 TSS 308 | hiker 
From 386 Task to Virtual 8086 Task (386 TSS) | _ 224 h,j.ker 
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Appendix A 


| CLOCKCOUNT | NOTES 


Real Real 
INSTRUCTION FORMAT Addreaa | Protected | Address Protected 
Modeor Virtual | Modeor Virtual 
Virtual Addreaa Virtual Addreaa 
8086 Mode 8086 Mode 
Mode | Mode 





CONTROL TRANSFER (Continued) 
RET = Return from CALL: 


Within Segment 11000011 10+ mM 10+ m b gh,r 


Within Segment Adding Immediate to SP 11000010 16-bit displ | 10+m 10 +m b g.h,r 


Intersegment 11001011 18+m 32+m b g, hj, kr 


Intersegment Adding Immediate to SP 11001010 16-bit displ 18 +m 32+mM b g.nj, kr 


Protected Mode Only (RET): 

to Different Privilege Leve! 
Intersegment 68 | hike 
Intersegment Adding Immediate to SP 68 hej ke 


CONDITIONAL JUMPS | 
NOTE: Times Are Jump ‘Taken or Not Taken” 
JO = Jump on Overfiow | 


8-Bit Displacement 011140000 | 8-bit disp} 7+ mor3t 7+ morg r 


Full Displacement 00001111 10000000 | full displacement 7+mor3t7+mor3 r 


JNO = Jump on Not Overflow 


8-Bit Displacement 01110001 8-bit disp! | 17 +mor3| 7+mor3d r 


it 


Full Displacement 00001111 | 10000001 | tull displacement 7+mor3| 7+ mor3 r 


JB/JNAE = Jump on Below/Not Above or Equal 


8-Bit Displacement | 01110010 8-bit displ 7+mor3!} 7+ mor3 r 
Full Displacement | 00001111 10000010 | full disptacement 7+mor3| 7+ mord r 


JNB/JAE = Jump on Not Below/Above or Equal 


8-Bit Displacement | 01110011 8-bit displ | 7+ mor3| 7+ mord r 
Full Displacement | 00001111 | 10000011 | full displacement 7+ mor3| 7+ mor3 r 


JE/JZ = Jump on Equal/Zero 


8-Bit Displacement 01110100 8-bit displ 7+ mor3| 7+ mor3 r 
Full Displacement 00001111 10000100 | full displacement 7+ mora] 7+mor3 r 


JNE/JNZ = Jumpon Not Equal/Not Zero 


8-Bit Displacement 01110101 8-bit displ 7+ mor3| 7+morg3 r 
00001111 10000101 | 


JBE/JNA = Jump on Below or Equai/Not Above 


8-Bit Displacement | 01110110 | 8-bit displ | 7+ mor3| 7+mor3 r 
Full Displacement 00001111 | 10000110 | full displacement 7+ mor3| 7+morg3 r 


JNBE/JA = Jump on Not Below or Equal/Above 


8-Bit Displacement 8-bit displ 7+mor3| 7+mor3 r 
Full Displacement 00001111 | 10000111 | full displacement 7+mor3| 7+morg r 


JS = Jump on Sign 


8-Bit Displacement | 01-1 4,1,00-0 | 8-bit displ | 7+mor3| 7+mor3d i 
Full Displacement 00004111 10001000 | full displacement 7+mor3| 7+mor3 r 


Full Displacement full displacement 7+ mor3| 7+mo3 i 
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INSTRUCTION FORMAT 


CONDITIONAL JUMPS (Continued) 
JNS = Jump on Not Sign 


8-Bit Displacement | 01111001 | 8-bdit displ | 
Full Displacement | 00001111 | 10001001 | fulldisplacement 


JP/JPE = Jump on Parity/Parity Even 


8-Bit Displacement | 01111010 | 8-bit displ | 
Full Displacement | 00001111 | 10001010 | full displacement 


JNP/JPO = Jump onNotParity/ParityOdd = | | 
8-Bit Displacement Gr 1.11 ORT 1 8-bit disp! 


Full Displacement 00001111 | 1000101 ir] full dispiacement 


JL/JNGE = Jump on Less/Not Greater or Equal 


8-Bit Displacement | 01111100 | 8-bit displ | 
Full Displacement | 00001111 | 10001100 tut displacement 


JNL/JGE = Jump on Not Less/Greater or Equal | z 
| 8-dit displ 


01111101 


Full Displacement | 00001111 | 10001101 | full displacement 


JLE/JNG = Jump on Less or Equal/Not Greater 


8-Bit Displacement ON 8-bit disp! 
Full Displacement | 00001111 | 10001110 | fulldisptacement 


JNLE/JG = Jump on Not Less or Equal/Greater 











8-Bit Displacement 











8-Bit Displacement OO eT eG 8-bitdisp| 


Full Displacement | 00001111 | 10001111 | full displacement 


JCXZ = Jump on CX Zero | 11100011 | 8-bit displ | 
JECXZ = Jump on ECX Zero | 11100011 | _ &-bit displ ! 
(Address Size Prefix Differentiates JCXZ from JECXZ) 


LOOP = Loop CX Times | 11100010 | 8-bit displ | 


LOOPZ/LOOPE = Loop with a 
Zero/Equal | 11400001 | 8-bit disp! | 


LOOPNZ/LOOPNE = Loop While © ——————__________ 
Not Zero 111100000 | B-bitdisp! | 


CONDITIONAL BYTE SET 
NOTE: Times Are Register/Memory 


SETO = Set Byte on Overflow 
To Register/Memory 00001111 10010000 |mod000-— f/m 
SETNO = Set Byte on Not Overflow 
To Register/Memory | 00001111 | 10010001 |mod000 = r/m} 


SETB/SETNAE = Set Byte on Below/Not Above or Equal 


To Register/Memory | 00001111 | 10010010 | mod000 






280 


ramming Guide 





Reel 
Address 
Mode or 

Virtual 

8086 

Mode 


7+mo3 


7+moda 


7+moa3 


7+mog3 


7+mog3 


7+mo3 


7+mora 


7+moa3 


7+mor3 


7+moa3 


7+mogd 


7+mogd3 


9+ mors 


9+ mor5 


4/§ 


4/5 


4/5 





CLOCK COUNT 


7+ mord3) 


17+mor3 





Protected 
Virtual 
Address 
Mode 


7+maga3 


7+mogd3 


7+mor3 


7+moa3 


7+ moa3 


7+moa3 


7+mog3 


7+moa 


7+ mora 


7+moa3 


7+moa3 


7+mor3 


7+moda 
7+moa3 


9+mor5 


114m 


11+m 


11+m 


4/5 


4/5 


4/5 








NOTES 
Address Protected 
Mode or Virtual 

Virtual Address 
8086 Mode 
Mode 

c 
c 
c 
c 
c 
f 
c 
c 
f 
lf 
c 
if 
r 
c 
° 
r 
c 
r 
c 
h 
h 
h 
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| CLOCK COUNT | NOTES _ 
Real Real 
INSTRUCTION FORMAT Address | Protected Address Protected 
| Mode or Virtual Mode or Virtual 
Virtual Address Virtual Address 
8086 Mode 8086 Mode 
== Mode Mode | 
CONDITIONAL BYTE SET (Continued) 
SETNB = Set Byte on Not Below/Above or Equal _ | 
To Register/Memory 00001111 10010011 | mod000 f/m 4/5 4/5 h 
SETE/SETZ = Set Byte on Equal/Zero ; 
To Register/Memory 00001111 | 10010100 |mod000 f/m 4/5 4/5 h 
SETNE/SETNZ = Set Byte on Not Equal/Not Zero 
To Register/Memory | 00001111 10010101 |mod000 r/m| 15) 4/5 h 
SETBE/SETNA = Set Byte on Below or Equai/Not Above 
To Register/Memory | 00001111 1G010110.imed000 rim] | 4/5 4 4/5 h 
SETNBE/SETA = Set Byte on Not Below or Equal/Above __ = | | 
To Register/Memory | 000011 1 1 10010111 |modQ00 f/m 4/5 4/5 h 
SETS = Set Byte on Sign 2 
To Register/Memory 00001111 | 10011000 |mod000 r/m 4/5 4/5 h 
SETNS = Set Byte on Not Sign a | | 
To Register/Memory 00001111 | 10011001 |mod000 r/m| | 4/5 4/5 h 
SETP/SETPE = Set Byte on Parity/Parity Even 
To Register/Memory 00001111 | 10011010 |mod000 f/m) 4/5 4/5 h 
SETNP/SETPO = Set Byte on Not Parity/Parity Odd | 
To Register/Memory | 00001111 mod000 f/m 4/5 4/5 h 
SETL/SETNGE = Set Byte on Less/Not Greater or Equal 
To Register/Memory | 00001111 | 10011100 |mod000_ f/m 4/5 4/5 h 
SETNL/SETGE = Set Byte on Not Less/Greater or Equal 
To Register/Memory | 00001111 | 01111101 {mod000_ 1r/m 4/5 4/5 n 
SETLE/SETNG = Set Byte on Less or Equal/Not Greater 
To Register/Memory 00001111 10011110 |mod000 f/m 4/5 4/5 h 
SETNLE/SETG = Set Byte on Not Less or Equal/Greater | 
To Register/Memory | 00001111 | 10014 1.47 | mod 0.00 vm | 4/5 4/5 A 
ENTER = Enter Procedure | 11001000 | 16-bit displacement, 86-bit level | 
L=90 10 10 b h 
a 12 12 b h 
Ll 1 15 “+ 1). 4 . h 
| 4(n — 1) 4(n — 1) 
LEAVE = Leave Procedure | 11001001 | 4 | 4 b h 
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CLOCK COUNT ___NOTES 


Real Real 
INSTRUCTION FORMAT Address | Protected Address Protected 
Mode or Virtual Mode or Virtua! 
Virtual Address Virtual Address 
8086 Mode 8086 Mode 
Mode Mode 





INTERRUPT INSTRUCTIONS 
INT = Interrupt: 
Type Specitied ei 10  1.QM , | a7 \ b 


Type 3 11001100 33 b 








INTO = Interrupt 4 If Overflow Flag Set 


fOr = 1 35 0,e 
lfOF = 0 J 3 b,e 
Bound = Interrupt 5 if Detect Vaiue 01100010 | modreg r/m | 


Out cf Range 


if Out of Range 44 b,e egh jkr 
If In Range 10 10 be @2gQghjkr 


Protected Mode Only (INT) 
INT: Type Specified 
Via Interrupt or Trap Gate 


to Same Privilege Level 59 Q,j.k,r 
Via Interrupt or Trap Gate 

to Different Privilege Level 99 g,j. kr 
From 286 Task to 286 TSS via Task Gate 282 9, j,k, Fr 
From 286 Task to 386 TSS via Task Gate : 309 Q.j.kr 
From 268 Task to virt 8086 md via Task Gate 226 Q jkr 
From 386 Task to 286 TSS via Task Gate 284 Qjkr 
From 386 Task to 386 TSS via Task Gate 371 Qj, kr 
From 368 Task to virt 8086 md via Task Gate 228 Qj, kr 
From virt 8086 md to 286 TSS via Task Gate 289 Qj, kr 
From virt 8086 md to 386 TSS via Task Gate 316 Qj kr 
From virt 8086 md to priv levei 0 via Trap Gate or interrupt Gate 119 

INT: TYPE 3 

Via Interrupt or Trap Gate 

to Same Privilege Level 59 "Q@.j.kve 
Via Interrupt or Trap Gate 

to Different Privilege Level 99 g. j,k, 6 
From 286 Task to 286 TSS via Task Gate 278 g, j,k. e 
From 286 Task to 386 TSS via Task Gate | 305 g.j, ke 
From 268 Task to Virt 8086 md via Task Gate 222 Q. j,k, ¢ 
From 386 Task to 286 TSS via Task Gate 280 Qj. ke 
From 386 Task to 386 TSS via Task Gate 307 g. j,k. ¢ 
From 368 Task to Virt 8086 md via Task Gate 224 Qj kr 
From virt 8086 md to 286 TSS via Task Gate 285 @ajak,t 
From virt 8086 md to 386 TSS via Task Gate 342 Ge ket 
From virt 8086 md to priv level 0 via Trap Gate or interrupt Gate 119 

INTO: 

Via Interrupt or Trap Grate 

to Same Privilege Level | 59 g,j.k,r 
Via Interrupt or Trap Gate 

to Different Priviege Level 99 Qg,j,ke 
From 286 Task to 286 TSS via Task Gate 280 Qg.j, kr 
From 286 Task to 386 TSS via Task Gate 307 g. j,k, ¢ 
From 268 Task to virt 8086 md via Task Gére 924 g,j.k,0 
From 386 Task to 286 TSS via Task Gate 282 gikr 
From 386 Task to 386 TSS via Task Gate 309 Q, j,k, r 
From 368 Task to virt 8086 md via Task Gate 226 Q.j,kr 
From virt 8086 md to 286 TSS via Task Gate 287 Qj. kr 
From vit 8086 md to 386 TSS via Task Gate 314 Q.j.k.r 
From virt 8086 md to priv jevel 0 via Trap Gate or interrupt Gate 119 
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INSTRUCTION 


FORMAT 


INTERRUPT INSTRUCTIONS (Continued) 
BOUND: 


Via Interrupt or Trap Gate 

to Same Privilege Level 
Via interrupt or Trap Gate 

to Ditferent Privilege Level 
From 286 Task to 286 TSS via Task Gate 
From 286 Task to 386 TSS via Task Gate 
From 268 Task to vit 8086 Mode via Task Gate 
From 386 Task to 286 TSS via Task Gate 
From 386 Task to 386 TSS via Task Gate 
From 368 Task to virt 8086 Mode via Task Gate 
From virt 8086 Mode to 286 TSS via Task Gate 
From vit 8086 Mode to 386 TSS via Task Gate 
From vitt 8086 md to priv level 0 via Trap Gate or Interrupt Gate 


INTERRUPT RETURN 


IRET = Interrupt Return 


| 11001191 | 


Protected Mode Only (IRET) 
To the Same Privilege Level (within task) 
To Different Privilege Level (within task) 
From 286 Task to 286 TSS 
From 286 Task to 386 TSS 
From 286 Task to Virtual 8086 Task 
From 286 Task to Virtual 8086 Mode {within task) 
From 386 Task to 286 TSS 
From 386 Task to 386 TSS 
From 386 Task to Virtual 8086 Task 
From 386 Task to Virtual 8086 Mode (within task) 


PROCESSOR CONTROL 
MOV = 


CRO/CR2/CR3 trom register 
Register From CRO-3 
DRO-3 From Register 
DR6- 7 From Register 


Register from DR6—-7 


Move to and From Control/Debug/ Teast Registers 


CLOCK COUNT 


Real 
Address 
Mode or 

Virtual 

8086 

Mode 


22 





000011114 


| 00001111 


00001111 





00100010 


00100000 6 
pics cn aati » 


00100011 


| 00001111 | 00100001 | 1 + eee reg 14 


00100001 





1 4 eee reg 10/4/5 


1 4 eee reg 16 





1 1 eee reg 22 


Register from DRO~-3 00001111 

TR6-7 from Register 00001111 | 00100110 | 1 1 eee reg 12 

Register from FR6-7 12 
NOP = No Operation | 10010000 | 3 
WAIT = Wait untli BUS Y# pin is negated | 10011011 | 6 
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Protected | 


Virtual 


Address 


Mode 


59 


99 
254 
284 
231 
264 
294 
243 
264 
294 
119 


38 
82 
232 
265 
244 
60 
271 
275 
224 


10/4/5 


6 


22 


16 


Address 
Mode or 
Virtual 


6066 
Mode 


NOTES 


Protected 


Virtual 


Address 


Mode 


Q.J kur 


g.j.Ke 
g.j.kK ye 
g.J.K.0 
Q.j.Kue 
g.J, kr 
g.J. kr 
g.j. k, 6, 
g4.kr 
g.j. KF 


gn jkr 


g. nj. ke 
gn), ke 


h, j,k, ¢ 
h,j.Kve 
hj k, ¢ 


hiyj.ke 
hej. ke 
hj, k, r 
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INSTRUCTION FORMAT 


PROCESSOR EXTENSION INSTRUCTIONS 


Processor Extension Escape 


PREFIX BYTES 
Address Size Prefix 01 1001 11 
LOCK = Bua Lock Prefix 11110000 
Operend Size Prefix 01100110 
Segment Override Prefix 


00101110 


CS: | 
Ds 
ES 
FS: | 011 00 100 | 
GS: | 01100101 | 
ss 
PROTECTION CONTROL 
ARPL = Adjust Requested Privitege Level 
From Register/Memory 
LAR = Losd Access Rights 


From Register/Memory 00001111 


LGDT = Loed Globel Descriptor 
Tabie Register 


LIDT = Losd Interrupt Descriptor 


Jt 


00001111 | 00000001 


| 11011TTT | mod LLL rim | 


TTT and LLL bits are opcode 
informatron for coprocessor. 


00000010 | mod reg r/m 


mod0Q10 rim 


Table Register 00001111 O000000t | mod011 = r/m 
LLDT = LosdLocel Dascriptor 

Table Register to — 

Register /Memory | 00001111 | 00000000 | mod010 f/m 


LMSW = Losd Mechine Stetus Word 


From Register/Memory 000011114 


LSL = Loed Segment Limit 
From Register/Memory | 00001111 


Byte-Granuiar Limit 
Page-Granular Limit 


LTA = Loed Task Register 

From Register/Memory 00001111 
SGDT = Store Global Descriptor 

Table Ragister 00001111 


00000000 


00000001 


00000001 |mod110 rim] 


00000011 | modreg rim 


mod001 ¢r/m 


mod 0 00 r/mM 
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CLOCK COUNT 





Reel | 
Address 
Mode or | 


Virtual 


8086 | 
Mode | 


See 
80287/80387: 
data sheets for | 


clock counts | 


N/A 
N/A 


11 


N/A 


10/13 


N/A 
N/A 


N/A 


Protected 
Virtual 
Address 
Mode 


20/21 


35/16 


11 


11 


20/24 


10/13 


20/21 
25/26 


23/27 





NOTES 
Feel 
Address Protected 
Mode or Virtuel 
Virtuel Address 
8066 Mode 
Mode 
h 
m 
/ 
a h 
a g. h, \, p 
bic h, | 
b,¢ h, | 
a g.hj,| 
b,¢ h, | 
a gh.j,p 
a Luli ls'p 
a g,h.j,| 
b,c h 





Appendix A 


INSTRUCTION FORMAT 


SIOT = Store Interrupt Descriptor 


CLOCK COUNT NOTES 
Real Real 
Address | Protected Address Protected 
Mode or Virtual Mode or Virtual 
Virtual Address Virtual Address 
8086 Mode 8086 Mode 
Mode Mode 


Table Register | 00001111 | 00000001 | mod 0.01 rim | 9 9 b,c h 


SLOT = Store Local Descriptor Table Register 


To Register/Memory | 00001111 | ¢0000000 | mod 0.00 rim | N/A 2/2 a h 
SMSW = Store Machine . 
Status Word 00001111 00000001 10/13 10/193 b,c h, | 


STR = Store Task Register 
To Register/Memory 00001111 | 00000000 
VERR = Verify Read Acceass 


modQQ1 f/m N/A 2/2 a h 


Register/Memory 00001111 | 00000000 |mod100 f/m N/A 10/11 a g.h. i, p 
VERW = Verity Write Accesss | 00001111 | 00000000 | mod 101 rim | N/A 18/16 a g.h,j,P 


INSTRUCTION NOTES FOR TABLE A.-1 Notes d through g apply to 80386 Real Address 
Mode and 80386 Protected Virtual Address Mode: 


Notes a through c apply to 80386 Real Address Mode 


only: d. The 80386 uses an early-out multiply algorithm. The 


a. This is a Protected Mode instruction. Attempted ex- 
ecution in Real Mode will result in exception 6 (in- 
valid opcode). 


b. Exception 13 fault (general protection) will occur in 
Real Mode if an operand reference is made that par- 
tually or fully extends beyond the maximum CS, DS, 
ES, FS or GS limit, FFFFH. Exception 12 fault 
(stack segment limit violation or not present) will 
occur in Real Mode if an operand reference is made 


that partially or fully extends beyond the maximum — ee. 


SS limit. 


c. This instruction may be executed in Real Mode. In ff. 


Real Mode, its purpose 1s primarily to initialize the 
CPU for Protected Mode. 


actual number of clocks depends on the position of 
the most significant bit in the operand (multiplier). 


Clock counts given are minimum to maximum. To 
calculate actual clocks use the following formula: 


Actual Clock = uf m < > 0 then max ({iog2!m!J, 3) 
+ 6 clocks: 

Actual Clock = if m =0 then 9 clocks (where m 
is the multiplier) 


An exception may occur, depending on the value of 
the operand. 


LOCK# is automatically asserted, regardless of the 
presence or absence of the LOCK# prefix. 


g. LOCK# is asserted during descnptor table accesses. 





Notes h through r apply to 80386 Protected Virtual 
Address Mode enly: 


h. Exception 13 fault (general protection violation) 
will occur if the memory operand tn CS, DS, ES, FS 
or GS cannot be used due to either a segment limit 
violation or access rights violation. If a stack limit is 
violated, an exception 12 (stack segment limit viola- 
tion Or not present) occurs. 


j. All segment descriptor access in the GDT or LDT 
made by this instruction will automatically assert 
LOCK# to maintain descriptor integrity in multi- 
processor systems. 


k. JMP, CALL, INT, RET and IRET instructions refer- 
ring to another code segment will cause an excep- 
tion 13 (general protection violation) if an ap- 
plicable privilege rule is violated. 


1. An exception 13 fault occurs if CPL is greater than 
O (0 is the most privileged level). 


m. An exception 13 fault occurs if CPL is greater than 
IOPL. 


n. The IF bit of the flag register is not updated if CPL 
is greater than IOPL. The IOPL and VM fields of 
the flag register are updated only if CPL = 0. 


o. The PE bit of the MSW (CRO) cannot be reset by 
this instruction. Use MOV into CRO if desiring to 
reset the PE bit. 


p. Any violation of privilege rules as applied to the 
selector operand does not cause a protection excep- 
tion; rather, the zero flag is cleared. 


q. If the coprocessor’s memory operand violates a seg- 
ment limit or segment access rights, an exception 13 
fault (general protection exception) will occur 
before the ESC instruction is executed. An excep- 
tion 12 fault (stack segment limit violation or not 
present) will occur if the stack limit is violated by 
the operand’s starting address. 


r. The destination of a JMP, CALL, INT, RET or 
IRET must be in the defined limit of a code segment 
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or an exception 13 fault (general protection viola- 
tion) will occur. 


INSTRUCTION ENCODING 


OVERVIEW 


All instruction encodings are subsets of the general in- 
struction format shown in Figure 8-1. Instructions con- 
sist of one or two primary opcode bytes, possibly an ad- 
dress specifier consisting of the “mod r/m”’ byte and 
“scaled index” byte, a displacement if required, and an 
immediate data field if required. 


Within the primary opcode or opcodes, smaller encod- 
ing fields may be defined. These fields vary according 
to the class of operation. The fields define such informa- 
tion as direction of the operation, size of the displace- 
ments, register encoding, or sign extension. 


Almost all instructions referring to an operand in 
memory have an addressing mode byte following the 
primary opcode byte(s). This byte, the mod r/m byte, 
specifies the address mode to be used. Certain encod- 
ings of the mod r/m byte indicate a second addressing 
byte, the scale-index-base byte, follows the mod r/m 
byte to fully specify the addressing mode. 


Addressing modes can include a displacement im- 
mediately following the mod r/m byte, or scaled index 
byte. If a displacement is present, the possible sizes are 
8, 16, or 32 bits. 


If the instruction specifies an immediate operand, the 
immediate operand follows any displacement bytes, The 
immediate operand, if specified, is always the last field 
of the instruction. 


Figure A-1 illustrates several of the fields that can ap- 
pear in an instruction, such as the mod field and the r/m 
field, but the Figure does not show all fields. Several 
smaller fields also appear in certain instructions, some- 
times within the opcode bytes themselves. Table A-2 ts 
a complete list of all fields appearing in the 80386 in- 
struction set. F‘urther ahead, following Table A-2, are 
detailed tables for each field. 





TTTTTTTT 
0.765320 


Aopendix A 


TTTTTTTT|modTTTr/m| ss index base |d32 | 16 | 8 | none data32 | 16 | 8 | none 


765320 


Ss SE RT SR eet Benen aia eee 


opcode ‘mod r/m’”’ ‘*s-j-b”’ address immediate 
(one or two bytes) byte byte displacement data 
(T represents an (4, 2, 1 bytes (4, 2, 1 bytes 
opcode bit.) register and address or none) or none) 
mode specifier 
Field Name Description Number of Bits 
Ww Specifies if Data is Byte or Full Size (Full Size is either 16 or 32 Bits 1 
d Specifies Direction of Data Operation 1 
S Specifies if an Immediate Data Field Must be Sign-Extended 1 
reg | General Register Specifier 3 
mod r/m Address Mode Specifier (Effective Address can be a Genera! Register) 2 for mod; 
3 for r/m 
Ss Scale Factor for Scaled Index Address Mode 2 
index General Register to be used as Index Register 3 
base General Register to be used as Base Register 3 
sreg2 Segment Register Specifier for CS. SS, DS, ES 2 
sreg3 Segment Register Specifier for CS, SS, DS, ES, FS, GS 3 
tttn For Conditional instructions, Specifies a Condition Asserted 
or a Condition Negated 4 


32-BIT EXTENSIONS OF THE 
INSTRUCTION SET 


With the 80386, the 86/186/286 instruction set is ex- 
tended in two orthogonal directions: 32-bit forms of all 
16-bit instructions are added to support the 32-bit data 
types, and 32-bit addressing modes are made available 
for all instructions referencing memory. This orthogonal 
instruction set extension is accomplished having a 
Default (D) bit in the code segment descriptor, and by 
having 2 prefixes to the instruction set. 


Whether the instruction defaults to operations of 16 bits 
or 32 bits depends on the setting of the D bit in the code 





segment descriptor, which gives the default length 
(either 32 bits or 16 bits) for both operands and effective 
addresses when executing that code segment. In the 
Real Address Mode or Virtual 8086 Mode, no code seg- 
ment descriptors are used, but a D value of O is assumed 
internally by the 80386 when opcrating in those modes 
(for 16-bit default sizes compatible with the 
8086/801 86/80286). 


Two prefixes, the Operand Size Prefix and the Effective 
Address Size Prefix, allow overriding individually the 
Default selection of operand size and effective address 
size. These prefixes may precede any opcode bytes and 
affect only the instruction they precede. If necessary, 
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one or both of the prefixes may be placed before the op- 
code bytes. The presence of the Operand Size Prefix and 
the Effective Address Prefix will toggle the operand 
size or the effective address size, respectively, to the 
value “opposite” from the Default setting. For example, 
if the default operand size is for 32-bit data operations, 
then presence of the Operand Size Prefix toggles the in- 
struction to 16-bit data operation. As another example, 
if the default effective address size is 16 bits, presence 
of the Effective Address Size prefix toggles the instruc- 
tion to use 32-bit effective address computations. 


These 32-bit extensions are available in all 80386 
modes, including the Real Address Mode or the Virtual 
8086 Mode. In these modes the default is always 16 
bits, so prefixes are needed to specify 32-bit operands or 
addresses. 


Unless specified otherwise, instructions with 8-bit and 
16-bit operands do not affect the contents of the high- 
order bits of the extended registers. 


ENCODING OF INSTRUCTION FIELDS 


Within the instruction are several fields indicating 
register selection, addressing mode and so on. the exact 
encodings of these fields are defined immediately 
ahead. 


ENCODING OF OPERAND LENGTH (w) 
FIELD 


Tor any given instniction performing a data operation, 
the instruction is executing as a 32-bit operation or a 16- 
bit operation. Within the constraints of the operation 
size, the w field encodes the operand size as either one 


byte or the full operation size, as shown in the table 
below. 


ENCODING OF THE GENERAL 
REGISTER (reg) FIELD 


The general register is specified by the reg field, which 
may appear in the primary opcode bytes, or as the reg 
field of the “mod r/m” byte, or as the r/m field of the 
“mod r/m”’ byte. 


Encoding of reg Field When w Field 
Is not Present In Instruction 






Register Selected | Register Selected 


reg Field During 16-Bit During 32-Bit 
Data Operations | Data Operations 
000 AX EAX 
001 CX ECX 
010 DX EDX 
011 BX EBX 
100 SP ESP 
101 BP EBP 
101 S| ESI 
101 DI EDI 


Encoding of reg Field When w Field 
Is Present in Instruction 


Register Specified by reg Field 
During 16-Bit Data Operations: 


Function of w Field 


reg ~~ = . 
__(whenw = 0) | (when w = 1) 
000 AL AX 
001 CL CX 
010 DL DX 
011 BL BX 
100 AH SP 
101 CH BP 
110 DH S| 
Astin BH DI 
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Register Specified by reg Field 
During 32-Bit Data Operations 


Function of w Field 





req. a 
(when w = 0) (when w = 1) 

000 AL EAX 

001 CL ECX 

010 DL EDX 

011 BL EBX 

100 AH ESP 

101 CH EBP 

110 DH ESI 

111 BH EDI 


ENCODING OF THE SEGMENT 
REGISTER (sreg) FIELD 


The sreg field in certain instructions is a 2-bit field al- 
lowing one of the four 80286 segment registers to be 
specified. The sreg field in other instructions is a 3-bit 
field, allowing the 80386 FS and GS segment registers 
to be specified. 


2-Bit sreg2 Field 






s. Segment 
Register 
sreg2 Field Selected 


3-Bit sreg3 Field 





3-Bit Segment 
sreg3 Field Peaster 
Selected 
000 ES 
001 CS 
010 SS 
011 DS 
100 FS 
101 GS 
140 do not use 
111 do not use 


ENCODING OF ADDRESS MODE 


Except for special instructions, such as PUSH or POP, 
where the addressing mode is pre-determined, the ad- 
dressing mode for the current instruction is specified by- 
addressing bytes following the primary opcode. The 
primary addressing byte is the “mod r/m” byte, and a 
second byte of addressing informnation, the “s-1-b”’ 
(scale-index-base) byte, can be specified. 


The s-i-b byte (scale-index-base byte) is specified when 
using 32-bit addressing mode and the “mod r/m” byte 
has r/m = 100 and mod = OO, 01 or 10. When the sib 
byte 1s present, the 32-bit addressing mode is a function 
of the mod, as, index, and base fields. 


The primary addressing byte, the “mod r/m” byte, also 
contains three bits (shown as TTT in Figure A-1) some- 
times used as an extension of the primary opcode. The 
three bits, however, may also be used as a register field 


(reg). 


When calculating an effective address, either 16-bit ad- 
dressing or 32-bit addressing is used. 16-bit addressing 
uses 16-bit address components to calculate the effec- 
tive address while 32-bit addressing uses 32-bit address 
components to calculate the effective address. When 16- 
bit addressing is used, the “mod r/m” byte is interpreted 
as a 16-bit addressing mode specifier. When 32-bit ad- 
dressing ts used, the “mod r/m” byte 1s interpreted as a 
16-bit addressing mode specifier. When 32-bit address- 
ing is used, the “mod r/m” byte 1s interpreted as a 32-bit 
addressing mode specifier. 


Tables on the following three pages define all encodings 
of all 16-bit addressing modes and 32-bit addressing 
modes. 
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Encoding of 16-bit Address Mode with “mod r/m’” Byte 





mod r/m F Effective Address mod r/m Effective Address 





00 000 DS:[BX + SI] 10 000 DS: [BX + SI+ 16] 
00 001 DS:(BX + DI] 10 001 DS:[BX + DI+ d16} 
00 010 SS: [BP + SI] 10010 SS:[BP +S1+ d16] 
00 011 SS:[BP+ Di] 10011 SS:[BP+ DI +16] 
00 100 DS:[S1] 10 100 DS:[SI + d16] 

00 101 DS:[DI) 10 101 ' DS: [DIl+ d16] 

00 110 DS:d16 10 110 SS:[BP+ d16] 

00 111 DS:[BX] 10 111 | DS:[BX + d16] 

01 000 DS:(BX + SI+ d8] 11 000 register—see below 
01 001 DS:[BX + D! + d8] 11 001 register—see below 
01010 SS:[BP + Si+ d8] 11010 register—see below 
01011 SS:([BP+ DI + d8] 11011 register—see below 
01 100 DS:[S! + d8] 11100 register—see below 
01 101 DS:[DI + d8] Ih 104 register—see below 
01110 SS:{BP + d8] 11110 register—see below 
01111 | DS:[Bx-+ d8] 11111 register—see below 


Register Specified by r/m 
During 16-Bit Data Operations 


Function of w Field 


mod r/m == 
(when w = 1) 

11 000 AX 

11001 CX 
11010 DX 
11011 BX 

11 100 SP 

11 101 BP 
11110 S| 

a i Sa DI 





Register Specified by ¢/m 
During 32-Bit Data Operations 


Function of w Field 


modr/m |_—— — 
(when w = 1) 
11 000 EAX 
11001 ECX 
11010 EDX 
11011 EBX 
11 100 ESP 
11 101 EBP 
11.110 ESI 
11111 EDI 
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Encoding of 32-bit Address Mode with “mod r/m” byte (no “s-i-b” byte present): 


Effective Address 





mod r/m 


Effective Address 





00 000 DS: [EAX} 10 000 DS:[EAX + d32} 

00 001 DS:[ECX] 10 001 DS:[ECX + d32] 
00010 DS:[EDX] 10010 DS:[EDX + d32] 
00011 DS:(EBX] 10011 DS: [EBX + d32] 

00 100 S-i-D is present 10 100 S-i-b is present 

00 101 DS:d32 10 101 SS:[EBP + d32] 

00 110 DS:[ES!) 10 110 DS:[ESI + d32] 
00111 DS:[EDI1) 10111 DS:[EDI + d32] 

01 000 DS:[EAX + d8] 11 000 register—see below 
01001 DS:[ECX + d8] 11001 register—see below 
01010 DS:([EDX + d8] 11010 register—see below 
01011 DS: [EBX + d8] 11011 register—see below 
01 100 S-i-b is present 11 100 register—see below 
01101 SS:[EBP+ d8] 11 101 register—see below 
01110: DS: [ESI+ d8] 11110 register—see below 
01111 DS:[ED! + d8] tt 114 register—see below 


Register Specified by reg orr/m 
during 16-Bit Data Operations: aa 


function of w field ; 


modr/m - 
(when w=0) (when w= 1) 
11 000 AL AX 
11001 CL CX 
11010 DL DX 
11011 BL BX 
11 100 AH SP 
11 101 CH BP 
11.110 DH S| 
Tht ale BH DI 





Register Specified by reg or r/m 
during 32-Bit Data Operations: 


function of w field 





modr/m {|— — 
(when w=0) (when w= 1) 
11 000 AL EAX 
11 001 CL ECX 
11010 DL EDX 
11011 BL EBX 
11100 AH | ESP 
11101 CH | EBP 
11110 DH ESI 
11111 BH EDI 


80386 Programming Guide 


Encoding of 32-bit Address Mode (‘mod r/m” byte and “s-i-b” byte present): 


Effective Address Ss Scale Factor 


mod base | 








00 000 DS:[EAX + (scaled index)] 00 x1 
00 001 DS: [ECX + (scaled index)] 01 | x2 
00 010 DS: (EDX + (scaled index)] 10 | x4 
00 011 DS:[(EBX + (scaled index)] 11 | x8 
00 100 SS:[ESP+ (scaled index)] 
00 101 DS:[d32 + (scaled index)] 
00 110 DS: [ESI + (scaled index)] index Index Register 
00 111 DS:[EDI-+ (scaled index)] 000 EAX 
001 ECX 
01 000 DS:[EAX + (scaled index) + d8] 010 EDX 
01001 DS: [ECX + (scaled index) + d8] 011 EBX 
01010 DS:[EDX + (scaled index) + d8]} 100 no index reg** 
01011 DS: [EBX + (scaled index) + d8] 101 EBP 
01 100 SS;[ESP-+ (scaled index) + d8] 110 ES] 
01 101 SS:[EBP + (scaled index) + d8] 111 EDI 
01110 DS:[ESI+ (scaled index) + d8]} 
O1 111 DS:(EDI+ (scaled index) + d8] **IMPORTANT NOTE: : 
When index field is 100, indicating “no index register,’ then 
. . ss field MUST equal OO. If index is 100 and ss does not 
- io - eee . as ear on equal O00, the aay address is undefined. 
10 010 DS:[EDX + (scaled index) + d32] 
10011 DS:(EBX + (scaled index) + d32] 
10 100 SS: [ESP + (scaled index) + d32] 
10 101 SS:[EBP + (scaled index) + d32] 
10 110 DS:[ESI + (scaled index) + d32] 
10111 DS:[EDI + (scaled index) + d32] 
NOTE: 
Mod field in ‘'mod r/m” byte; ss, index, base fields in 
“'s-i-b” byte. 





ENCODING OF OPERATION DIRECTION 
(D) FIELD 


In many two-operand instnictions the d field is present 
to indicate which oerand is considered the source and 
which is the destination. 


d Direction of Operation 


Register/Memory <- - Register 

“reg” Field indicates Source Operand; 

‘mod r/m”’ or ‘mod ss index base’”’ Indicates 
Destination Operand 


1 | Register <- - Register/Memory 
‘reg’ Field Indicates Destination Operand; 
‘mod r/m’’ or “mod ss index base’”’ Indicates 
Source Operand 





ENCODING OF SIGN-EXTEND 
(s) FIELD 


The s field occurs primarily to instructions with im- 
mediate data fields. The s field has an effect only if the 
size of the immediate data is 8 bits and is being placed 
in a 16-bit or 32-bit destination. 





‘ Effect on | Effect on 
Immediate Data8 Immediate Data 16/32 

OiNone None 

1/Sign-Extend Data8 to Fill None 


16-Bit or 32-Bit Destination 


ENCODING OF CONDITIONAL TEST 
(tttn) FIELD 


For the conditional instructions (conditional jumps and 
set on condition), tttn is encoded with n indicating to 
use the condition (n = 0) or its negation (m = 1), and ttt 
giving the condition to test. 


Mnemonic) Condition tttn 








O Overflow 0000 
NO No Overfiow 0001 
B/NAE Below/Not Above or Equal 0010 
NB/AE Not Below/ Above or Equal 0011 
E/Z Equal/Zero 0100 
NE/NZ Not Equal/Not Zero 0101 
BE/NA Below or Equal/Not Above 0110 
NBE/A Not Below or Equal/Above 0111 
S Sign 1000 
NS Not Sign 1001 
P/PE Parity/Parity Even 11010 
NP/PO Not Parity/Parity Odd 1011 


L/NGE Less Than/Not Greater or Equal |1100 
NL/GE Not Less Than/Greater or Equal |1101 


ENCODING OF CONTROL OR DEBUG OR 
TEST REGISTER (eee) FIELD 


For the loading and storing of the Control, Debug and 
Test registers. 


When Interpreted as Control Register Field 


eee Code ri. Reg Name 
000 CRO 
010 CR2 
O11 | CR3 


Do not use any other encoding 


When Interpreted as Debug Register Field 





eee Code Reg Name 
000 DRO 
001 DR1 
010 DR2 
011 DR3 
110 | DR6 
111 DR7 


Do not use any other encoding 


When Interpreted as Test Register Field 


eee Code Reg Name 
110 TRE 
111 TR7 





Do not use any other encoding 


Go 











Complete 80386 Flag Cross-Reference 


KEY TO CODES 


instruction tests flag 

instruction moaifies flag 

(either sets or resets depending on operands) 
instruction resets flag 


instruction sets flag 

instruction’s effect on flag is undefined 
instruction restores prior value of flag 
instruction does not affect flag 


Inetructlon OF | SF ze] AF PF | CF | TF ae DOF | NT | AF 





AAA —j|— 
AAD — | M 
AAM — | M 
AAS — i — 
ADC M M 

M M 

0 M 


z\| lz 


ADD 
AND 
ARPL 
BOUND 
BSF/BSR — 
BT/BTS/BTR/BTC ce 
CALL 

CBW 

CLC 

CLO 

CLI 

CLTS 

CMC 

CMP 

CMPS M 
CWD 

DAA o— 
DAS — 
DEC M 
DIV — 
ENTER 
ESC 
HLT 
IDJV 
IMUL 
IN 

INC M M 
INS 
INT 
INTO 
IRET 
Jcond 


|S ZzZzz|z2| 
zzz| zz 
Zz 


= | 


<oc 


(cs f= 
44 
=<=z 


M 
M 
M 
M 
M 


iiss 25 


= | 
| 
= | 


ADA 
D 
DOO 
AOO 
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instruction 

JCXZ 
JMP 
LAHF 
LAR 
LDS/LES/LSS/LFS/LGS 
LEA 


LEAVE 
LGDT/LIDT/LLDT/LMSW 
LOCK 

LODS 

LOOP 
LOOPE/LOOPNE 
LSL 

LTR 

MOV 

MOV control, debug 
MOVS 
MOVSX/MOVZX 
MUL 

NEG 

NOP 

NOT 

OR 

OUT 

OUTS 

POP/POPA 

POPF 
PUSH/PUSHA/PUSHF 
RCL/RCR 1 
RCL/RCR count 
REP/REPE/REPNE 
RET 

ROL/ROR 1 
ROL/ROR count 

S AHF 
SAL/SAR/SHL/SHR 14 
SAL/SAR/SHL/SHR count 
SBB 

SCAS 

SET cond 
SGDT/SIDT/SLDT/SMSW 
SHLD/SHRD 

STC 

STD 

STI 

STOS 

STR 

SUB 

TEST 

VERR/VERRW 

WAIT 

XCHG 

XLAT 

XOR 
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z= 
= | 
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TM 
= TM 
M M 
nh M 
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0 M Mi}— |M 0 
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80386 Status Flag Summary 


STATUS FLAGS’ FUNCTIONS 


Bit Name Function 


| 0 CF Carry Flag — Set on high-order bit carry or borrow; cleared otherwise. : 


cleared otherwise. 


2 PF Parity Flag — Set if low-order eight bits of result contain an even number of 1 bits; 


AF Adjust flag — Set on carry from or borrow to the low order four bits of AL; cleared 
otherwise. Used for decimal arithmetic. 


ZF Zero Flag — Set if result is zero; cleared otherwise. 
SF Sign Flag — Set equal to high-order bit of result (0 is positive, 1 if negative). 


OF Overflow Flag — Set if result is too large a positive number or too small a negative 
number (excluding sign-bit) to fit in destination operand; cleared otherwise. 








KEY TO CODES 


instruction tests flag 
instruction modifies flag 
(either sets or resets depending on opera\ds) 


instruction resets flag 
instruction’s effect on flag is undefined 
instruction does not affect flag 
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instruction 


AAA 
AAS 


AAD 
AAM 


DAA 
DAS 


ADC 
ADD 
SBB 
SUB 
CMP 
CMPS 
SZAS 
NEG 


DEC 
INC 


IMUL 
MUL 


RCL/RCR 1 

RCL/RCR count 
ROL/ROR 1 

ROL/ROR count 
SAL/SAR/SHL/SHR 1 
SAL/SAR/SHL/SHR count 


SHLD/SHRD 
BSF/BSR 
BT/BTS/BTR/BTC 


AND 
OR 
TEST: 
XOR 
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summary of 80386 Descriptors 


Table D-1 
Summary of 80386 Descriptors 





Code Type of Segment or Gate 


-reserved 
Available 286 TSS 
LDT 

Busy 286 TSS 
Call Gate 

Task Gate 

286 Interrupt Gate 
286 Trap Gate 
-reserved 
Available 386 TSS 
-reserved 

Busy 386 TSS 
386 Cal! Gate 
-reserved 

386 Interrupt Gate 
386 Trap Gate 
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Figure D-1 
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DATA SEGMENT DESCRIPTOR 





yy, : te 3 7 ah a4 
Wf Ch ff ‘SEGMENT BASE 15..0 sem te SEGMENT LIMIT 15..0 


A — ACCESSED E — EXPAND-DOWN 
AVL — AVAILABLE FOR PROGRAMMER USE G — GRANULARITY 

B — BIG P — SEGMENT PRESENT 
C — CONFORMING R — READABLE 

D — DEFAULT Ww — WRITABLE 

DPL — DESCRIPTOR PRIVILEGE LEVEL 


Data and executable segment descriptors. 


Figure D-2 


SYSTEM SEGMENT DESCRIPTOR 


- 





| ff ; , | Yr 
hy ; WL, CPE i) 
SEGMENT BASE 15..0 a SEGMENT LIMIT 15..0 0 
AVL — AVAILABLE FOR PROGRAMMER USE 
OPL — DESCRIPTOR PRIVILEGE LEVEL 
P — SEGMENT PRESENT 


System segment descriptor. 
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BSE 31..24 i 
. 0/1/0187. BASE 23..16 


BASE 15..0 LIMIT 15..0 





Figure D-3 
Task state segment descriptor for 32-bit task state segment. 


80386 TASK GATE 










80386 INTERRUPT GATE 
31 23 15 7 0 
OFFSET 31.16 ;PEOPLGO 11 10f000 4 


] 


SELECTOR OFFSET 15..0 





80386 TRAP GATE 


SELECTOR | OFFSET 15..0 








I ! 


Figure D-4 
Gate descriptors. 
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23 7 0 


15 
: AVAILABLE f . OPL : ; TYPE | AVAILABLE : 4 
AVAILABLE 0 


i ——a = ih i | 
CSS SSS a a a Sa a a a a a eS) 


G30117 
Figure D-5 
Not-present descriptor. 


GLOBAL DESCRIPTOR TABLE LOCAL DESCRIPTOR TABLE 





G30117 
Figure D-6 
Global and local descriptor tables. 
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Figure D-7 
Interrupt descriptor table. 
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Differences Between 80286 and 
80386 Processors 


EXECUTING 80286 PROTECTED- 
MODE CODE 


80286 CODE EXECUTES AS A SUBSET 
OF THE 80386 


In general, programs designed for execution in protected 
mode on an 80286 execute without modification on the 
80386, because the features of the 80286 are a subset of 
those of the 80386. 


All the descriptors used by the 80286 are supported by 
the 80386 as long as the Intel-reserved word (last word) 
of the 80286 descriptor is zero. 


The descriptors for data segments, executable segments, 
local descriptor tables, and task gates are common to 
both the 80286 and the 80386. Other 80286 descriptors 
— TSS segment, call gate, interrupt gate, and trap gate 
—— are supported by the 80386. The 80386 also has new 
versions of descriptors for TSS segment, call gate, inter- 
rupt gate, and trap gate that support the 32-bit nature of 
the 80386. Both sets of descriptors can be used simul- 
taneously in the same system. 


For those descriptors that are common to both the 80286 
and the 80386, the presence of zeros in the final word 
causes the 80386 to interpret these descriptors exactly as 
80286 does; for example: 


Base Address The high-order eight bits of the 32- 
bit base address are zero, limiting 
base addresses to 24 bits. 

Limit The high-order four bits of the limit 
field are zero, restricting the value of 
the limit field to 64K. 

Granularity bit The granularity bit is zero, which in- 
plies that the value of the 16-bit limit 
is interpreted in units of one byte. 
3-bit In a data-segment descriptor, the B- 
bit is zero, implying that the segment 
is no larger than 64 K bytes. 

D-bit In an executable-segment descriptor, 
the D-bit 1s zero, implying that 16-bit 
addressing and opcrands are _ the 
default. 


lor formats of these descriptors and documentation of 
their use refer to the :APX 286 Progranuner's Reference 
Manual. 
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TWO WAYS TO EXECUTE 
80286 TASKS 


When porting 80286 programs to the 80386, there are 
two cases to consider: 


1. Porting an entire 80286 system to the 80386, com- 
plete with 80286 operating system, loader, and sys- 
tem builder. 


In this case, all tasks will have 80286 TSSs. The 
80386 is being used as a faster 286. 


2. Porting selected 80286 applications to run in an 
80386 environment with an 80386 operating system, 
loader, and system builder. 


In this case, the TSSs used to represent 80286 tasks 
should be changed to 80386 TSSs. It is theoretically 
possible to mix 80286 and 80386 TSSs, but the 
benefits are slight and the problems are great. It is 
recommended that all tasks in a 80386 software sys- 
tem have 80386 TSSs. It is not necessary to change 
the 80286 object modules theinselves; TSSs are 
usually constructed by the operating system by the 
loader, or by the system builder. Refer to Chapter 16 
for further discussion of the interface between 16-bit 
and 32-bit code. 
DIFFERENCES FROM 80286 


The few differences that do exist primarily affect 
Operating system code. 


WRAPAROUND OF 80286 24-BIT PHYSI- 
CAL ADDRESS SPACE 


With the 80286, any base and offset combination that 
addresses beyond 16M bytes wraps around to the first 
megabyte of the 80286 address space. With the 80386, 
since it has a greater physical address space, any such 
address falls into the 17th megabyte. In the unlikely 
event that any software depends on this anomaly, the 
same effect can be simulated on the 80386 by using 
paging to map the first 64K bytes of the 17th megabyte 
of logical addresses to physical addresses in the first 
megabyte. 


RESERVED WORD OF DESCRIPTOR 


Because the 80386 uses the contents of the reserved 
word (Jast word) of every descriptor, 80286 programs 
that place values in this word may not execute correctly 
on the 80386. 


NEW DESCRIPTOR TYPE CODES 


Operating-system code that manages space in descriptor 
tables often uses an invalid value in the access-rights 
field of descriptor-table entries to identify unused 
entries. Access right values of 80H and OOH remain in- 
valid for both the 80286 and 80386. Other values that 
were invalid on for the 80286 may be valid for the 
80386 because of the additional descriptor types defined 
by the 80386. 


RESTRICTED SEMANTICS 
OF LOCK 


The 80286 processor implements the bus lock function 
differently than the 80386. Programs that use form of 
memory locking specific to the 80286 may not execute 


properly when transported to a specific application of 
the 80386. 


The LOCK prefix and its corresponding output signal 
should only be used to prevent other bus masters from 
interrupting a data movement operation. LOCK may 
only be used with the following 80386 instructions 
when they modify memory. An undcfined-opcode ex- 
ception results from using LOCK before any other in- 
struction. 


. Bit test and change: BTS, BTR, BTC. 
. Exchange: XCHG. 


. One-operand arithmetic and logical: INC, DEC, NOT, 
and NEG. 


. Two-operand arithmetic and logical: ADD, ADC, 


SUB, SBB, AND, OR, XOR. 
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A locked instruction 1s guaranteed to lock only the area 
of memory defined by the destination operand, but may 
lock a larger memory area. For example, typical 8086 
and 80286 configurations lock the entire physical 
memory space. With the 80386, the defined area of 
memory is guaranteed to be locked against access by a 
processor executing a locked instruction on exactly the 
same memory area, 1.e., an operand with identical start- 
ing address and identical length. 


DDITIONAL EXCEPTIONS 


The 80386 defines new exceptions that can occur even 
in systems designed for the 80286. 


. Exception #6 — invalid opcode 


This exception can result from improper use of the 


LOCK instruction. 
. Exception #14 —page fault 


This exception may occur in an 80286 program if the 
Operating system enables paging. Paging can be used 
in a system with 80286 tasks as long as all tasks use 
the same page directory. Because there ts no place in 
an 80286. TSS to store the PDBR, switching to an 
80286 task does not change the value of PDBR. Tasks 
ported from the 80286 should be given 80386 TSSs so 
they can take full advantage of paging. 


DIFFERENCES FROM 80286 
REAL-ADDRESS MODE 


The few differences that exist between 80386 real-- 
address mode and 80286 real-address mode are not like- 
ly to affect any existing 80286 programs except possibly 
the system initialization procedures. 


BUS LOCK 


The 80286 processor implements the bus lock function 
differently than the 80386. Programs that use forms of 
memory locking specific to the 80286 may not execute 


properly if transported to a specific application of the 
80386. 


The LOCK prefix and its corresponding output signal 
should only be used to prevent other bus masters from 
interrupting a data movement operation. LOCK may 
only be used with the following 80386 instructions 
when they modify memory. An undefined-opcode ex- 
ception results from using LOCK before any other in- 
struction. 


. Bit test and change: BTS, BTR, BTC. 
. Exchange: XCHG. 


. One-operand arithmetic and logical: INC, DEC, NOT, 
and NEG 


. Two-operand arithrnetic and logical: ADD, ADC, 
SUB, SBB, AND, OR, XOR. 


A locked instruction is guaranteed to lock only the area 
of memory defined by the destination operand, but may 
lock a larger memory area. For example, typical 8086 
and 80286 configurations lock the entire physical 
memory space. With the 80386, the defined area of 
memory is guaranteed to be locked against access by a 
processor executing a locked instruction on exactly the 
same memory area, 1.e., an Operand with identical start- 
ing address and identical length. 


LOCATION OF FIRST INSTRUCTION 


The starting location is OFFFFFFFOH (sixteen bytes 
from end of 32-bit address space) on the 80386 rather 
than OFFFFFOLI! (sixteen bytes from end of 24-bit ad- 
dress space) as on the 80286. Many 80286 ROM in- 
itialization programs will work correctly in this new en- 
vironment. Others can be made to work correctly with 
external hardware that redefines the signals on A3}).20. 


INITIAL VALUES OF GENERAL 
REGISTERS 


On the 80386, certain general registers may contain dif- 
ferent values after RESET than on the 80286. This 





should not cause compatibility problems, because the 
content of 8086 registers after RESET is undefined. If 
self-test is requested during the reset sequence and er- 
rors are detected in the 80386 unit, EAX will contain a 
nonzero value. ECS contains the component and 
revision identifier. Refer to Chapter 10 for more infor- 
mation. 


MSW INITIALIZATION 


The 80286 initializes the MSW register to FFFOH, but 
the 80386 initializes this register to OOOOH. This dif- 
ference should have no effect, because the bits that are 
different are undefined on the 80286. Programs that 
read the value of the MSW will behave differently on 
the 80386 only if they depend on the setting of the un- 
defined, high-order bits. 


DIFFERENCES FROM 80286 
REAL-ADDRESS MODE 


The 80286 processor implements the bus lock function 
differently than the 80386. This fact may or may not be 
apparent in 8086 programs, depending on how the V86 
monitor handles the LOCK prefix. LOCKed instructions 
are sensitive to IOPL; therefore, software designers can 
choose to emulate its function. If, however, 8086 
programs are allowed to execute LOCK directly, 
programs that use forms of memory locking specific to 
the 8086 may not execute properly when transported to 
a specific application of the 80386. 


The LOCK prefix and its corresponding output signal 
should only be used to prevent other bus masters from 
interrupting a data movement operation. LOCK may 
only be used with the following 80386 instructions 
when they modify memory. An undefined-opcode ex- 
ception results from using LOCK before any other in- 
struction. 
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_ Bit test and change: BTS, BTR, BTC. 
. Exchange: XCHG. 


. One-operand arithmetic and logical: INC, DEC, NOT, 
and NEG. 


. Two-operand arithmetic and logical: ADD, ADC, 
SUBC, SBB, AND, OR, XOR. 


A locked instruction is guaranteed to lock only the area 
of memory defined by the destination operand, but may 
lock a larger memory area. For example, typical 8086 
and 80286 configurations lock the entire physical 
memory space. With the 80386, the defined area of 
memory is guaranteed to be locked against access by a 
processor executing a locked instruction on exactly the 
same memory area, |.e., an Operand with identical start- 
ing address and identical length. 
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Differences Between 8086 and 
80386 Processor 


REAL-ADDRESS MODE 


DIFFERENCES FROM 8086 


In general, the 80386 in real-address mode will correct- 
ly execute ROM-based software designed for the 8086, 
8088, 80186, and 80188. Following 1s a list of the minor 
differences between 8086 execution on the 80386 and 
on an 8086. 


13 


Instruction clock counts. 


The 80386 takes fewer clocks for most instructions 
than the 8086/8088. The areas most likely to be af- 
fected are: 


. Delays required by I/O devices between I/O 
operations. 


. Assumed delays with 8086/8088 operating in 
parallel with an 8087. 


Divide Exceptions Point to the DIV instruction. 


Divide exceptions on the 80386 always leave the 
saved CS:IP value pointing to the instruction that 
failed. On the 8086/8088, the CS:IP value points to 


the next instruction. 


4. 


Undefined 8086/8088 opcodes. 


Opcodes that were not defined for the 8086/8088 
will cause exception 6 or will execute one of the 
new instructions defined for the 80386. 


Value written by PUSH SP. 


The 80386 pushes a different value on the stack for 
PUSH SP than the 8086/8088. The 80386 pushes the 
value of SP before SP is incremented as part of the 
push operation; the 8086/8088 pushes the value of 
SP after it is incremented. If the value pushed 1s im- 
portant, replace PUSH SP instructions with the fol- 
lowing three instructions: 


PUSH BP 
MOV BP, SP 
XCHG BP, [EP] 


This code functions as the 8086/8088 PUSH SP in- 
struction on the 80386. 


Shift or rotate by more than 31 bits. 
The 80386 masks all shift and rotate counts to the 


low-order five bits. This MOD 32 operation limits 
the count to a maximum of 31 bits, thereby limiting 
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the time that interrupt response in delayed while the 
instruction is executing. 


. Redundant prefixes. 


The 80386 sets a limit of 15 bytes on instruction 
length. The only way to violate this limit is by put- 
ting redundant prefixes before an instruction. Excep- 
tion 13 occurs if the limit on instruction length ts 
violated. The 8086/8088 has no instruction length 
limit. 


. Operand crossing offset 0 or 65,535. 


On the 8086, an attempt to access a memory 
operand that crosses offset 65,535 (e.g., MOV a 
word to offset 65,535) or offset O (e.g. PUSH a word 
when SP = 1) causes the offset to wrap around 
modulo 65,536. The 80386 raises an exception in 
these cases — exception 13 if the segment is a data 
segment (i.e., if CS, DS, ES, FS, or GS is being used 
to address the segment), exception 12 if the segment 
is a stack segment (.e., if SS 1s being used). 


. Sequential execution across offset 65,535. 


On the 8086, tf sequential execution of instructions 
proceeds past offset 65,535, the processor fetches 
the next instruction byte from offset O of the same 
segment. On the 80386, the processor raises excep- 
tion 13 in such a case. 


. LOCK 1s restricted to certain instructions. 


The LOCK prefix and its corresponding output sig- 
nal should only be used to prevent other bus masters 
from interrupting a data movement operation. The 
80386 always asserts the LOCK signal during an 
XCHG instruction with memory (even if the LOCK 
prefix is not used). LOCK may only be used with 
the following 80386 instructions when they update 
memory: BTS, BTR, BTC, XCHG, ADD, ADC, 
SUB, SBB, INC, DEC, AND, OR, XOR, NOT, and 
NEG. An undefined-opcode exception (interrupt 6) 
results from using LOCK before any other instruc- 
tion. 


10. 


ll. 


12. 


13. 


14 


15. 


Single-stepping extemal interrupt handlers. 


The priority of the 80386 single-step exception is 
different from that of the 8086/8088. The change 
prevents an external interrupt handler from being 
single-stepped if the interrupt occurs while a 
program is being single-stepped. The 80386 single- 
step exception has higher priority that any extemal 
interrupt. The 80386 will still single-step through an 
interrupt handler invoked by the INT instructions or 
by an exception. 


IDIV exceptions for quotients of 80H or 8000H. 

The 80386 can generate the largest negative number 
as a quotient for the IDIV instruction. The 
8086/8088causes exception zero instead. 


Flags in stack. 


The setting of the flags stored by PUSHF, by inter- 
rupts, and by exceptions is different from that stored 
by the 8086 in bit positions 12 through 15. On the 
8086 these bits are stored as ones, but in 80386 real- 
address mode bit 15 its always zero, and bits 14 
through 12 reflect the last value loaded into them. 


NMI interrupting NMI handlers. 
After an NMI is recognized on the 80386, the NMI 


interrupt is masked until an IRET instruction is ex- 
ecuted, 


. Coprocessor errors vector to interrupt 16. 


Any 80386 system with a coprocessor must use in- 
terrupt vector 16 for the coprocessor error exception. 
If an 8086/8088 system uses another vector for the 
8087 interrupt, both vectors should point to the 
coprocessor-error exception handler. 


Numeric exception handlers should allow prefixes. 


On the 80386, the value of CS:IP saved for 
coprocessor exceptions points at any prefixes before 
an ESC instruction. On 8086/8088 systems, the 
saved CS:IP points to the ESC instruction. 


16. Coprocessor does not use interrupt controller. 


17. 


The coprocessor error signal to the 80386 does not 
pass through an interrupt controller (an 8087 INT 
signal does). Some instructions in a coprocessor 
error handler may need to be deleted if they deal 
with the interrupt controller. 


Six new interrupt vectors. 


The 80386 adds six exceptions that arise only tf the 
8086 program has a hidden bug. It is recommended 
that exception handlers be added that treat these ex- 
ceptions as invalid operations. This additional 
software does not significantly affect the existing 
8086 software because the interrupts do not nomnal- 
ly occur. These interrupt identifiers should not al- 
ready have been used by the 8086 software, because 
they are in the range reserved by Intel. Table 14-2 
describes the new 80386 exceptions. 
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18. One megabyte wraparound. 


The 80386 does not wrap addresses at 1 megabyte in 
real-address mode. On members of the 8086 family, 
it is possible to specify addresses greater than one 
megabyte. For example, with a selector value 
OFFFFH and an offset of OFFFFH, the effective ad- 
dress would be 1OFFEFH (1 Mbyte + 65519). The 
8086, which can form addresses only up to 20 bits 
long, truncates the high-order bit, thereby “wrap- 
ping” this address to OFFEFH. However, the 80386, 
which can form addresses up to 32 bits long does 
not truncate such an address. 
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VIRTUAL 8086 MODE 


DIFFERENCES FROM 8086 


In general, V86 mode will correctly execute software 
designed for the 8086, 8088, 80186, and 80188. Follow- 
ing is a list of the minor differences between 8086 ex- 
ecution on the 80386 and on an 8086. 


l. 


Instruction clock counts. 


The 80386 takes fewer clocks for most instructions 
than the 8086/8088. The areas most likely to be af- 
fected are: 


. Delays required by I/O devices between I/O 
operations. 


. Assumed delays with 8086/8088 operating in 
parallel with an 8087. 


Divide exceptions point to the DIV instruction. 


Divide exceptions on the 80386 always leave the 
saved CS:IP value pointing to the instruction that 
failed. On the 8086/8088, the CS:IP value points to 
the next instruction. 


. Undefined 8086/8088 opcodes. 


Opcodes that were not defined for the 8086/8088 
will cause exception 6 or will execute one of the 
new instructions defined for the 80386. 


Value written by PUSH SP. 


The 80386 pushes a different value on the stack for 
PUSH SP than the 8086/8088. The 80386 pushes the 
value of SP before SP is incremented as part of the 
push operation; the 8086/8088 pushes the value of 
SP after it is incremented. If the value pushed is im- 
portant, replace PUSH SP instructions with the fol- 
lowing three instructions: 


% 


PUSH BP 
MOV BP, SP 
XCHG BP, _ {BP] 


This code functions as the 8086/8088 PUSH SP in- 
struction on the 80386. 


Shift or rotate by more than 31 bits. 


The 80386 masks all shift and rotate counts to the 
low-order five oits. This MOD 32 operation limits 
the count to a maximum of 31 bits, thereby limiting 
the time that interrupt response is delayed while the 
instruction is cxecuting. 


Redundant prefixes. 


The 80386 sets a limit of 15 bytes on instruction 
length. The only way to violate this limit is by put- 
ting redundant prefixes before an instruction. Excep- 
tion 13 occurs if the limit on instructions length is 
violated. The 8086/8088 has no instruction length 
limit. 


Operand crossing offset 0 or 65,535. 


On the 8086, an attempt to access a memory 
operand that crosses offset 65,535 (e.g., MOV a 
word to offset 65,535) or offset O (e.g., PUSH a 
word when SP = 1) causes the offset to wrap around 
modulo 65,536. The 80386 raises an exception in 
these cases — exception 13 tf the segment Is a data 
segment (1.e., if CS, DS, ES, FS, or GS ts being used 
to address the segment), exception 12 if the segment 
is a stack segment (i.e., if SS is being used). 


Sequential execution across offset 65,535. 


On the 8086, if sequential execution of instructions 
proceeds past offset 65,535, the processor fetches 
the next instruction byte from offset 0 of the same 
segment. On the 80386, the processor raises excep- 
tion 13 in such a case. 


9. 


10. 


ar. 


12. 
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LOCK 1s restricted to certain instructions. 


The LOCK prefix and its corresponding output sig- 
nal should only be used to prevent other bus masters 
from interrupting a data movement operation. The 
80386 always asserts the LOCK signal during an 
XCHG instruction with memory (even if the LOCK 
prefix is not used). LOCK may only be used with 
the following 80386 instructions when they update 
memor:;; BTS, BTR, BTC, XCHG, ADD, ADC, 
SUB, SBB, INC, DEC, AND, OR, XOR, NOT, and 
NEG. An undefined-opcode exception (interrupt 6) 
results from usiug LOCK before any other instruc- 
tion. 


Single-stepping extemal intermpt handlers. 


The priority of the 80386 single-step exception 1s 
different from that of the 8086/8088. The change 
prevents an external interrupt handler from being 
single-stepped if the interrupt occurs while a 
program is being single-stepped. The 80386 single- 
step exception has higher priority that any extemal 
interrupt. The 80386 will still single-step through an 
interrupt handler invoked by the INT instructions or 
by an exception. 


IDIV exceptions for quotients of 80H or 8000H. 


The 80386 can generate the largest negative number 
as a quotient for the IDIV instruction. The 
8086/8088 causes exception zero instead. 


Flags in stack. 


The setting of the flags stored by PUSHF, by inter- 
rupts, and by exceptions 1s different from that stored 
by the 8086 in bit positions 12 through 15. On the 
8086 these bits are stored as ones, but in V86 mode 
bit 15 1s always zero, and bits 14 through 12 reflect 
the last value loaded into them. 


NMI interrupting NMI handlers. 
After an NMI is recognized on the 80386, the NMI 


interrupt is masked until an IRET instruction 1s ex- 
ecuted. 


14. Coprocessor errors vector to interrupt 16. 


i 


16. 


Any 80386 system with a coprocessor must use in- 
terrupt vector 16 for the coprocessor error exception. 
If an 8086/8088 system uses another vector for the 
8087 interrupt, both vectors should point to the 
coprocessor-error exception handler. 


Numeric exception handlers should allow prefixes. 


On the 80386, the value of CS:IP saved for 
coprocessor exceptions points at any prefixes before 
an ESC instruction. On 8086/8088 systems, the 
saved CS:IP points to the ESC instruction itself. 


Coprocessor does not use interrupt controller. 


The coprocessor error signal to the 80386 does not 
pass through an interrupt controller (an 8087 INT 
signal does). Some instructions in a coprocessor. 
error handler may need to be deleted if they deal 
with the interrupt controller. 
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Overview of the 80287 and 80387 
Numerical Data Processors 


80287 and 80387 
Numerics Coprocessors 


The 80287 and 80387 are high-performance floating 
point coprocessors for 80386-based systems. The 80287 
makes numerics power available to low-cost 80386 
designs, while the 80387 provides enhanced 
functionality and the highest numerics performance 
available for 32-bit microprocessors. Both implement 
the IEEE 754 floating point standard, with high- 
precision 80-bit architectures and full support for single, 
double, and extended precision operations. Both 
coprocessors offer substantial performance enhance- 
ment over numerics software, are binary-compatible 
with the industry-standard 8087 numerics coprocessor, 
and both are fully supported by Intel and third-party 
high-level languages, as well as the Intel standard 
numerics libraries. 


Product Highlights 

80287 and 80387 

. High-performance 80-bit intemal architectures 

. Implement IEEE 754 floating point standard 

. Datatypes include 32-bit single real, 64-bit double 
real, 80-bit extended real, 16-bit word integer, 32-bit 
short integer, 64-bit long integer, and 18-bit BCD in- 
tegertypes 


. Object code compatible with 8087 


. Optimized interface with 80386 processor for highest 
possible floating point performance 


. Directly extends 80386 instruction set to include 
trigonometric, logarithmic, exponential, and arith- 
metic instructions for all datatypes 


. Operation completely conforms to 80386 native mode 
operation 





80387 only 
_ Full 32-bit interface to 80386 local bus 
. Enhanced trigonometric support 


. Overall performance 1.8 million double-precision 
Whetstones/second 


. CHMOS II technology 
Product Description 


The 80287 and 80387 provide high-performance float- 
ing point capabilities for 80386 designs, with the 80287 
being particularly well-suited for cost-sensitive applica- 
tions and the 80387 for designs that require maximum 
performance. The 80387 is the latest entry in the Intel 
numerics coprocessor family, which started with the 
8087in 1979 and continued with the 80287 in 1982. 


The 80387 incorporates the same philosophy followed 
throughout the family. First, :mplement the IEEE 754 
floating point standard. This allows quick system design 
by providing a numerics solution that already imple- 
ments a standard and is guaranteed correct. Second, 
remain object code compatible with previous members 
of the family — the 8087 and 80287. This allows all 
previous software developed for 86 family numerics ap- 
plications to be available for 80386 designs. Finally, 
provide an enhancement in performance to keep floating 
point performance improvements in line’ with 
microprocessor performance improvements, allowing 
80386-based products to be leaders in numerics perfor- 
mance as well as overall performance. 


The following table summarizes the key differences be- 
tween the 80287 and 80387: 


80287 80387 

Process 

Technology © HMOS II CHMOS IIf 
Package 40-pinCERDIP  68-pin Ceramic 

Grid Array 

Data Interface 

Width 16-bit 32-bit 
Clock Speeds 5,8, 10, 12MHz 12,16MHz 
Trigonometric Tan, Arctan, Tan, Arctan, Sin, 
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Cos, Simultaneous 
Sin-Cos,Unlimited 
Argument Range 


Support O<=x<=n/4 


Programmer Model 


Both the 80287 and 80387 contain a stack of eight 80- 
bit registers for numerics function computations. They 
also support seven datatypes: 32-bit short real, 64-bit 
long real, 80-bit extended real, 16-bit word integer, 32- 
bit short integer, 64-bit long integer, and 18-digit 
packed BCD integer. The 80287 and 80387 hold all 
numbers in the extended real format intemally. Load in- 
structions automatically convert operands represented in 
memory as 16-, 32-, or 64-bit integers, 32- or 64-bit 
floating point numbers, or 18-digit packed BCD num- 
bers into extended real format. Store instructions 
automatically perform the reverse type conversion. This 
capability allows numerics applications to view data in 
the most appropriate form without concern for type con- 
versions. 


The 80287 and 80387 provide the full set of IEEE-- 
compatible computational instructions. Additionally, 
commonly used constants are provided to again simplify 
development of numerics applications. The instruction 
sets provided are summarized in the following table. 


80287 and 80387 Computational Instructions 


Add Real Square Root 

Add Integer Scale (fast multiply/divide 
Subtract Real by power of 2) 

Subtract Integer Partial Remainder 

Multiply Real Round to Integer 

Multiply Integer Extract Exponent and Significand 
Divide Real Absolute Value 

Divide Integer Change Sign 

Compare Real Test for Zero 


Compare Integer Examine Top of Stack 


80287 and 80387 Constant Instructions 


Load Zero Load log210 

Load One Load log2c 

Load P1 Load logi02 
Load in 2 
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Transcendental Instructions 


80287 80387 

Tangent O<=x<=n/4 full range 
Arctangent full range full range 
Sine n.a. full range 
Cosine n.a. full range 
Simultaneous Sine 

and Cosine n.a. full range 
2* — | Oceaxa, 5 U<=x<=.5 
y*log2x full range full range 
yelog2 (x+1) full range full range 


The above set of instructions allow development of all 
types of numerics applications, such as solids modeling, 
mechanical: simulation, robot control, and scientific 
analysis to name just a few. This is a powerful addition 
to the already powerful capabilities made available by 
the 80386 processor. 


Performance 


The 80287 implements numerics algorithms orders of 
magnitude faster than software implementations. The 
80387 is even faster. The performance of these 
coprocessors executing the standard Whetstone 
benchmark ts shown below: 


(1.8) 





8()287-10 8387-16 


This level of performance embodied in the 80287 and 
80387 allows development of powerful, numerics-- 
intensive systems. And since the 80287 and 80387 are 
standard, easy-to-use coprocessors, the development of 
these systems is very straightforward and fast. 
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Instruction Set of the 80287 Numerical 
Data Processor 


Data Transfer 


FLD = LOAD 


Integer/Real Memory to ST(0) 


Long Integer Memory to ST(0) 


Temporary Real Memory to 
ST(0) 


8CD Memory to ST(0) 


ST(i) to ST(0) 


FST = STORE 
ST(0) to Integer/Real Memory 


ST(0) to ST(i) 


FSTP = STORE AND POP 
ST(0) to integer/Real Memory 


ST(0} to Long Integer Memory 
ST(0) to Temporary Rea! 
Memory 


ST(0) to BCD Memory 


ST(0) to ST(i) 


FXCH = Exchange ST(i) and 
S7(0) 





|ESCAPE 1 1 


' ESCAPE O 1 1 


Optional 
8,16 Bit 
Dispiacement 








ESCAPE MF 


ESCAPE 1 1 


“wat 


1 | MOD 1 1 #1 


ESCAPE O 1 1 MOD 1 1 1 
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—__-- xo -_--_ > wt 
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Clock Count Range 
32 Bit 
integer 


16 Bit 
Integer 


32 Bit 
Real 


38-56 52-60 40-60 46-54 


60-68 


53-65 


290-310 


17-22 


84-90 82-92 96-104 80-90 


15-22 


86 -92 84-94 98-106 82-92 


94-105 


52-58 


520-540 


17-24 


10-15 
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Comparison 
FCOM = Compare 





Integer/Real Memory to ST(0) | ESCAPE MF 0 | MOD 0 1 0 A/M DISP | 60-70 78-91 65-75 72-86 
ST(i) to ST (0) (ESCAPE 0 0 0 | ) 40-50 





FCOMP = Compare and Pop 


ESCAPE MF 0 | MOD 0 _ DISP | 63-73. «80-93. 67-77 74-88 


STi) to ST(O) ESCAPE 0 U 0 45-52 


FCOMPP = Compare ST(1) to ESCAPE 1 1 0O 45-55 


ST(0) and Pop Twice 








integer/Real Memory to ST(0) 













FTST = Test ST(0) ESCAPE 0 0 1 e oPR' 38-48 
FXAM = Examine ST(0) ESCAPE 0 O 1. 12-293 
| Optional! | Clock Count Range 
8.16 Bit 32Bit | 32Bit | 64Bit | 16 Bit 
Constants | Disptacement Real cage oi Rea! | Integer 
MF 
FLDZ = LOAD + 00 into ST(0) | ESCAPE 0011/11 1041110 11-17 
FLD1 = LOAD + 10intoST(0) | ESCAPE 0 01/11 101000 15-21 


FLDPI = LOAD = into ST(0) ESCAPE 0 0 1 eb spat NO ied ait fi 16-22 
FLDL2T = LOAD Iogp 10 into | ESCAPE O03 ted. 21 01 fe, G3. 16-22 


ST(0) 

FLDL2E = LOAD log; e into ESCAPE 0 0 1 a 15-21 

ST(0) —S— 

FLDLG2 = LOAD !0910 2 into 

ST(0) 18-24 

FLDLN2 = LOAD log¢2 into 17-23 

ST(0) 

Arithmetic 

FADD = Addition 

tnteger/Real Memory with ST(0) DISP 3 90-120 108-143 95-125 102-137 


ST(?) and ST(O) 70-100 (Note 1) 
FSUB = Subtraction 


Imeger/Reat Memory with ST(0) “DISP) § = 90-120 108-143 95-125 102-137 


ST(i) and ST(0) | ESCAPE 0 PO | 1 1 1 0 R R/M | 70-100 (Note 1) 


FMUL = Multiplication 
integer/Real Memory with ST(0) | ESCAPE MF 0 MOD 0 0 1 R/M DISP 110-125 130-144 112-168 124-138 





—_—_————. 


ST(«) and ST(0) ESCAPE d Pp | 0 a 90-145 (Note 1) 





ST(1) and ST(0} 


FSORT = Square Root ot S* (0) 
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FSCALE = Scale ST(0) oy ST(1) [ ESCAPE 0 0 1 
FPREM = Partial Remainder of | ESCAPE 0 O 1 LD ft -eMwohe on, 


ST(0) =ST{1) 


FRNOINT = Round ST(0} to | ESCAPE 0O O 1 | 1 1t ft tte | 


Integer 


NOTE: 
1. If P=1 tnen add 5 clocks. 


FXTRACT = Extract 
Components of St(0Q) 


FABS = Absolute Vaiue of 
ST(Q} 


FCHS = Change Sign of ST(Q) 


Transcendental 
FPTAN = Partial Tangent of 
ST(0} 


FPATAN = Partial Arctangent 
of ST(O0) =ST(1) 


Foxmt = 20/'°) 4 


FYL2X = ST(1)° Loge 
(ST(O)| 


FYL2XPt = ST(1}* Loge 
(ST(O) +1] 


Processor Control 
FINIT = Initialize NPX 


FSETPM = Enter Protected 
Mode 


FSTSW AX = Store Control 
Word 


FLOCW = Load Control Word 


FSTCW 


Store Contro} Word 


FSTSW = Store Status Word 


ESCAPE 0 0 1 


Fescaree 001/11 100001 | 10-17 
ESCAPE 0.01/11 100000 | 10-17 





ESCAPE O O 


ESCAPE O O 1 


ESCAPE O O 1 


| ESCAPE 001 
| ESCAPE 001 














215-225 230-243 220-230 224-238 


193-203 (Note 1) 


180-186 


32-38 


15-190 


16-50 


Optional Clock Count Range 


8,16 Bit 
Displacement 


27-55 


30-540 


250-800 


310-630 


900-1100 


700-1000 


2-8 


2-6 


[escape 11 1]11 100000 | 10-16 


ESCAPE 0 O 


ESCAPE 0 O 1 


| ESCAPE 


321, 








12-18 


12-18 
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FCLEX = Ciear Exceptions 


FSTENV = Store Environment 


FLOENV = Load Environment 


FSAVE = Save State 


FRSTOR = Restore State 


FINCSTP = Increment Stack 
Po:nter 


FDOECSTP = Decrement Stack 


Pointer 


FFREE - Free ST{)) 


FNOP = No Operation 


ESCAPE 0 11/11 100010 2-8 
| ESCAPE 0 0 1 | MOD 1 1 0 RIM | o1sP 40-50 
| ESCAPE 0 0 1 | MOD 1 0 0 RIM | DISP | 35-45 
ESCAPE 1 0 1 | MOO 1 t O RIM | ose | 205-215 


ESCAPE 1 0 1 MOD 1 0 O R/M BISP ; 205~215 





ESCAPE 0 0 1 6-12 


jESGAPE O At] ts * 14 oF 1 Of 6-12 


ESCAPE 10 1 





1 oe 0 Sto 
| escapee 0 01] 11010000 fay a 8 “1 BO OF OG 10-16 





NOTES: 


l. 


If mod = 00 then DISP = 0*, disp-low and disp-high are absent. 

If mod = 01 then DISP = disp-low sign-extended to 16-bits, disp-high is absent. 
If mod = 10 then DISP = disp-high; disp-low. 

If mod = 11 then r/m is treated as an ST(1i) field. 


If r/m = 000 then EA = (BX) + (S1) + DISP 
If r/m = 001 then EA = (BX) + (D1) + DISP 
If r/m = 010 then EA = (BP) + (S!) + DISP 
If r/m = 011 then EA = (BP) + (D1) + DISP 
If r/m = 100 then EA =(S1) + DISP 

If r/m = 101 then EA = (D1) + DISP 

If rm = 110 then EA = (BP) + DISP 

If r/m = 111 then EA = (BX) + DISP 


*Except uf mod = 000 and r/m = 110 then EA = disp-high; disp-low 


3. MF = Memory Format 
O0-32-bit Real 
O1-—32-bit Integer 
10-64-bit Real 
11-1 6-bit Integer 


4. ST (O)= Current stack top 
ST(1) . register below stack top 


5. d= Destination 
O—Destination 1s ST(Q) 
1—Destination is ST(1) 


6, P=Pop 
O-No pop 
1—Pop ST(Q) 


7. R= Reverse: When d= 1 reverse the sense of R 
O—Destination (op) Source 
1-Source (op) Destination 


8. For FSQRT: -0<ST(0)< + 
ForFXCALE: ~2)? < ST(1) < +2)? and 
ST(1) integer 
For F2XM1: 0<ST(0)<27 
ForFYL2x: 0<ST(0)< 
-“< STC) < oo 
For FYL2XPI: 0 <1ST(0)l <(2- v2 
—-«o <S§T(l)<© 
For FPTAN: 0<ST(0)< 17/4 
For FPATAN: Os ST(O) < ST(1) <+ 


9. ESCAPE bit pattern is 11011. 
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Instruction Set of the 80387 
Numerical Data Processor 


80387 EXTENSIONS TO THE 80386 IN- R XOR d = 0—Destination (op) Source 
STRUCTION SET R XOR d = 1—Source (op) Destination 
Instructions for the 80387 assume one of the five fonns ST(i) = Register stack element i 
shown in the following table. In all cases, instructions OOO = Slack top 
are at least two bytes long and begin with the bit pattem OO1 = Second stack element 
11011B, which identifies the ESCAPE class of instruc- ° 
tion. Instructions that refer to memory operands specify 8 
addresses using the 80386 addressing modes. e 
111 = Exghth stack element 
OP=Instruction opcode, possible split into two a a 
fields OPA and OPB ___ First @yte eee 


[on [ope | 1 | woo | 1 | oe | am | se | psp 


1 
2 [io | MF | opa | moo | ope | Am | sie | oIsP | 
MF=Memory Format 2 | ott [eo ey orn | | a | ore |) Sta | 
4 hile | Las i ore a) 
OO—32-bit real i, Cmetecicninn I< all. 
O1—32-bit integer eM A OG ER Pe 
10—64-bit real 


11—16-bit integer 


P =Pop 
O—Do not pop stack 
1—Pop stack after operation 


ESC=11011 
d =Destination 


O—Destination is ST(Q) 
1—Destination ts ST(i) 
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inefuction 

DATA TRANSFER 

FLD = Loads 
Integer/real memory to ST(0) 
Long integer memory to ST(0) 
Extended real memory to ST(0) 
BCD memory to ST(0) 

ST(i) to ST(0) 

FST = Store 
ST(0) to imteger/ree! memory 
ST(0) to ST(I) 

FSTP = Store and Pop 
ST(0) to integer/real memory 
ST(0) to long integer memory 
ST(0) to extended real 
ST(0) to BCO memory 
ST(0) to ST(i) 

FXCH « Exchange 
ST(I) and ST(0) 

COMP ARIBON 

FCOM = Compare 
Integer/reat memory to ST(0) 
ST(I) to ST(0) 

FCOMP = Compare and pop 
Integer/reai memory to ST 
ST(i) to ST(O) 

FCOMPP = Compaen and pop twice 
ST(1) ¥¢ ST(0) 

FTST = Test ST(0) 

PUCOM = Unerdered compare 

PUCOW - Urasduud corpse 
Grd pop 

FUCOES? @= Ungised asremre 
Gnd 960 tetas 

FXAM = Examine ST(0) 


CONSTANTS 

FLDZ = Load +0.0 into ST(0) 
FLD1 = Load + 1.0 into ST(0) 
FLOP! = Load pi into ST(0) 
FLDLZT = Load loge{10) into ST(0) 


Encoding — 
a 


























Bytes 2-6 








| ESCMF1 | MODOOOR/mM | SiB/OISP | 20 45-52 25 61-65 
ESC 111 MOD 101A/M |  Si8/DISP 56-67 
ESC 011 MOD 101 R/M S18/DISP 44 
ESC 111 | MOD100A/M | SIB/DISP 266-275 
3 13000 ST(i) | 14 
| ESCMF1 | MOOO10R/M | _ SIB/0ISP 44 79-93 45 62-05 
11010 ST(i) 1 
| ESCMF1 | MODO11A/M |  SIB/OISP 44 79-93 45 82-05 
ESC 111 MOD 111 R/M SIB/O1SP 80-97 
ESC011 | MOD111A/M | SiB/OISP 53 
ESC 111 MOD 110 R/M SiB/OISP 512-534 
Esc10: | 11001ST() | 12 
| €SCoot 11001 STG) _| 18 
ESCMFO | MOD010R/M | _ SIB/OISP 26 56-83 31 71-75 
| Escooo | 1t010STF)_ | 24 
ESCMFO | MODOWA/M | _ sSiB/oISP, | 26 56-69 31 71-75 
11011 STi) __ 26 
| €8C110 | 11011001 _ ‘| 26 
26 
2 
| €sc101 | 111018TH | a6 
ESC 010 1110 1001 a9 
1 | 100101 30-38 
[ Escoo1 | wtoino | 20 
x 
40 
[ escoor | 11101001 | 40 


Shaded areas indicate instructions not available in 8087/80287. 


NOTE: 


a. When loading singie- or double-precision zero from memory, add 5 clocks. 
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Instruction 


CONSTANTS (Continued) 
FLOL2E = Load logz(e) into ST(0) 
FLOLG2 


Load log ;9(2) into ST(0) 
FLOLN2 = Load loge(2) into ST{0) 
ARITHMETIC 
FADD = Add 
Integer/real memory with ST(0) 
ST(i) and ST(0) 
FSUB = Subtract 
Integer/real memory with ST(0) 
ST(i) and ST(0) 
FMUL = Multiply 
Integer/real memory with ST(0) 
ST(i) and ST(0) 
FOIV = Divide 
Integer/real memory with ST(0) 
ST(i) and ST(0) 
FSORT! = Square root 
FSCALE = Scale ST(0) by ST(1) 
FPREM = Partial remainder 


FPREM1 — Partial remainder 
(lEEE) 


FRNDINT = Round ST(0) 
to integer 


FXTRACT = Extract components 
of ST(0) 


FABS = Absolute value of ST(0) 
FCHS = Change sign of ST(0) 


NOTES: 


 Optionat | 32-Bit 
Bytes 2-6 Real 


|__ ESC 001 


Encoding 


1110 1010 


[esc 001 
| ESC 001 | 1110 1101 | 


|__ESC MF 0 


| MOD 000 A/M 


| ESCdPO | 11000 ST(i) | 


ESC MF 0 
| eESCadPO 


| ESCMFO 
| ESCdPO 


| ESCMFO 


MOD 10RA/M | 
| 1NORR/M | 


| MOD001R/M | 
| 11001 A/M 


| MOD 11 AR/M | 


| ESCd PO | 1111 A R/M | 
| ESCOO1 | 1111 1010 | 


| ESC 001 | 1111 1101 | 
| _€SCoo1 | 11111000 ‘| 


__Escoor | 11110101 __| 
ESC 001 11111100 


|  €SCoo1 


| 11110100 | 


| €SCoo1 11100001 | 
ESCoo1. | 11100000 ~*:| 


b. Add 3 clocks to the range when d = 1. 
c. Add 1 clock to each range when R = 1. 
d. Add 3 clocks to the range when d = 0. 
e. typical = 52 (When d = O, 46-54, typicat = 49). 
f. Add 1 clock to the range when R = 1. 


g. 1395-141 when R = 1. 


h. Add 3 clocks to the range whend = 1. 


i —0 < ST(O) < +00. 
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endix 


SiB/DISP 


'S1B/DISP 


SIB/DISP 


S!B/DISP 


24-32 


24-32 


27-35 


89 


Clock Count Range 


32-Bit 
integer 


64-Bht 16-Bit 
Real integer 





4| 


4} 


57-72 29-37 71-85 


23-316 
57-82 


28-36 71-83¢ 


26-349 
61-82 


32-57 76-87 


29-579 
120-127! 94 136-1409 
gah 
122-129 
67-86 
74-155 


96-185 


66-80 


70-76 
22 
24-25 





Instruction 


TRANSCENDENTAL 

PCOS? = Cosine of ST(0) 

FPTAN* = Partial tangent of ST(0) 
FPATAN = Partial arctangent 

PaRe = Sine of STO) 

PEINCO™ = Gine end cosine of ST(0) 
F2XMi! = 2ST(0) — 1 

FYL2X™ = ST(1) * logo(ST(0)) 


FYL2XP19 = ST(1) * fogo(ST(0) + 1.0) 
PROCESSOR CONTROL 


FINIT = Initialize NPX 

FSTSW AX = Store status word 
FLOCW = Load control word 
FSTCW = Store control word 
FSTSW = Store status word 


FCLEX = Clear exceptions 

FSTENV = Store environment 
FLOENV = Load environment 
FSAVE = Save state 

FRSTOR = Restore state 

FINCSTP = Increment stack pointer 
FDECSTP = Decrement stack pointer 
FFREE = Free ST(I) 

FNOP = No operations 


NOTES: 


j. These timings hold for operands in the range |x| < 2/4. For operands not in this range, up to 76 additional clocks may be 


needed to reduce the operand. 
k. 0 < | ST(O)| < 269. 
l —4.0 < ST(O) s 1.0. 


Encoding 


Byte Optional 
| 1 Bytes 2-6 


[escoor [ oasstaty | 
| ESC 001 | 11110010 | 


ESC 001 11110011 








[ escoor | 11110000 | 
[ escoor | 11110001 | 





ESC 001 11111001 | 


| ESC111 
esc 101 | MOdin1A™ | 





ESC 101 | MOD 111 A/M 


Esco1 | 11100010 | 


| €SC001 | MOD110A/M | 

| €SC001 | MOD100R/M | 

| €SC101 | MOD110R/M | 
ESC 101 MOD 100 R/M 


| _€seoo1 | ition | 


| esc1o1 | 11000STH) | 





m. 0 < ST(0) < 0, —00 < ST(1) < +0. 
n. 0 < |ST(O)| < (2 — SQRT(2))/2, — 2% < ST(1) < +00. 
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SIB/DISP_ 


SIB/DISP 


SIB/DISP 
SIB/DISP 
SIB/DISP 
SIB/DISP 


Clock Count Range 


123-7723 
191-497) 
314-487 
122-7711 
194-609) 
211-476 
120-538 
257-547 
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103-104 
71 
3765-376 
308 
24 
22 
18 
12 


Appendix | 


MOD (Mode field) and R/M_ (Register/Memory 
specifier) have the same interpretation as the cor- 
responding fields of 80386 instructions (refer to 80386 
Programmer's Reference Manual) 


SIB (Scale Index Base) byte and DISP (displacement) 
are optionally present in instructions that have MOD 


and R/M fields. Their presence depends on the values of 
MOD and R/M, as for 80386 instructions. 


The instruction summaries that follow assume that the 
instruction has been prefetched, decoded, and 1s ready 
for execution; that bus cycles do not require wait states; 
that there are no local bus HOLD request delaying 
processor access to the bus; and that no exceptions are 
detected during the instruction execution. [f the instruc- 
tion has mOD and R/M fields that call for both base and 
index registers, add one clock. 
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"(around characters), 34 

[,] (around memory addresses), 34 
: (in front of labels), 34 

? (undefined initial value), 81 

; (indicating a comment), 34 


A 


A (accessed) bit, 162 
Aborts, 213, 214 
Absolute value, 114-115 
Accessed (A) bit in segment descriptor, 162 
Accumulator, 34 
Active-low, 232 
ADC instruction, 59 
Adding entnes to a list, 110, 114 
Add-on 80386 CPU boards, 238 
Addressable domain restrictions, 175, 178 
Addressing in subroutines, 116-117 
Addressing modes, 45-48 
definitions, 45 
examples, 46 
execution time, 48 
list, 45 
overlapped execution, 5, 48 
summary, 47 
Address pipelining, 231, 247 
Address size, 78-79 
Address size override, 78 
Address status (ADS#) signal, 238, 245 


Address translation, 8 
8086, 157-158 
paging, 167-173 
protected mode, 165-166 
segmentation, 157-167 
ADS# signal, 238, 245 
AF (auxiliary carry flag), 38 
Aliases (of descriptors), 202 
ALIGN directive, 117 
Aligning memory accesses, 117 
Alphabetical listing of instruction set, 49-53 
Applications, 9-20 
list, 9-10 
Arithmetic, multiple-precision, 108-109 
Arithmetic and logical instructions, 58-59, 66 
carry, including of, 59 
LEA, 58 
order of operands, 58 
ARPL instruction, 178 
Array bounds, 102-103, 119 
Array manipulation, 101-103 
Artificial intelligence, 18 
ASCH, 42 
ASCT (unpacked BCD) arithmetic instructions, 
66 
ASCH characters, 104-108 
Asserted state, 232 
Associative caches, 261, 262-263, 265 
Autodecrementing, 63, 102, 105 
Autoincrementing, 63, 102, 105 
Auxiliary Carry flag, 38 
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B (big) bit in data segment descriptor, 177 
B (busy) bit in task state descriptor, 195 
use, 201 
Background tasks, 204 
Back link (in task state segment), 192, 201 
removing, 201 
unused, 202 
Barrel shifter, 5, 79 
Base, 46 
Based index addressing with displacement, 45 
examples, 46 
extra clock cycle, 48 
Base pointer, 35, 116 
Base register, 34 
Baud rate generator, 129 
BCD (binary-coded-decimal) representation, 42 
Benign exceptions, 220 
Big (B) bit in data segment descnptor, 177 
BIOS, 61 
Bit manipulation, 91-92 
bit manipulation instructions, 91 
DEC, 92 
examples, 91-92 
INC, 92 
logical instructions, 91-92 
Bit manipulation instructions, 66, 70, 88, 91 
Bit pattern comparison, 92 
Bit scan instructions, 66, 80 
Bit string, 42 
Bit testing, 96-97 
BLD386 program, 183-184, 205-207 
Block fill, 113-114 
Block input/output, 122, 125 
Block size (in a cache), 260, 261 
Boolean values, 71 
BOUND instruction, 102-103, 119, 213, 220 
Bounds check exception, 220 
Bounds checking, 102-103 
Brackets around memory addresses, 34, 90 
Breakpoints, 220, 226 
BS 16# signal, 238 
BT instruction, 91, 97, 136 


Go 


BTC instruction, 91 

BTR instruction, 91, 136 

BTS instniction, 91 

Buffered I/O, 136, 141-142 

Buffer pointers, 142 

Bus-based systems, 235-236 

Bus control logic, 255-257 

Bus cycle types, 242, 247 

Bus interface unit, 8 

Bus master, 242 

Bus operation, 242-249 
cycle types, 242, 247 
interrupt acknowledge cycles, 247-249 
non-pipelined cycles, 242-247 
pipelined cycles, 247 

Bus performance, 249 

Busy (B) bit in task state descriptor, 195, 201 

BUSY# signal, 241 

Busy task, 201 

Byte, 42 

Byte-addressable registers, 36 

Byte enable (BE) signals, 235 

BYTE PTR operator, 56 


C 


C (conforming) bit, 183 

Cache, 31, 154, 260-268 
associative, 261, 262-264, 265 
block size, 261 
considerations, 260 
controller, 260-261 
direct-mapped, 261, 264-265 
fully associative, 261, 262-263 
non-cacheable memory, 267-268 
organizations, 261 -262 
performance, 268 
set-associative, 262, 265 
types, 261-262 
updating, 265-267 

Cache coherency, 267 

Cache controller, 260-261, 265, 266, 268 

Cache organization, 261 -262 


Cache performance, 268 
Cache updating, 265-267 
CAD/CAM/CAE systems, 14-17 
application areas, 17 
tasks, 14-17 
typical systems, 14 
Call gates, 179-183 
entry points, 180 
format, 179-180 
multiple gates, 181 
new privilege level, 180 
parameter count, 181 
stack change, 181 
Carry flag, 38 
arithmetic, 59 
bit manipulation instructions, 38, 66 
bit test, 97 
CMP, 99 
comparing equal values, 118 
DEC, 79 
INC, 79 
logical instructions, 38, 79 
multiple-precision arithmetic, 79, 108 
purpose, 38 
shift instructions, 80, 94 
Case statement, 104 
CF (carry flag), 38 
Character, 42 
Character manipulation, 104-107 
CL register, use in shifts, 59 
Clear instruction, 59 
Clearing bits, 91 
Clearing registers, 59 
CLI instruction, 140 
CLK2 signal, 242 
Clock counts for instructions, 272-285 
CMP (compare) instruction, 59, 97-100 
Code conversion, 107-108, 113 
Code prefetch unit, 8 
Code segment (CS) register, 38 
interrupt descriptor table, 139 
Coherency, 267 
Common programming errors, 118-119 
Compaq Deskpro 386, 10 


_ Index — 


Comparing bit patterns, 92 
Comparing values, 97-100 
order of operands, 97 
signed values, 100 
unsigned values, 99-100 
Compatibility, 29-30, 36, 37, 305-313 
Compilation, 5 
Complementing bits, 91, 92 
Conditional jumps, 60-61, 96-100 
comparisons, 60, 97-100 
frequently used, 55, 60-61 
JECXZ, 71 
LOOP, 71, 100 
mnemonics, 60 
signed, 100 
summary, 72-73 
unsigned, 99-100 
Conditional LOOPs, 71, 101 
Conditional REPs, 63, 107, 113, 114 
Confornung code segments, 183 
Context of a task, 188 
Contributory exceptions, 220-222 
Control registers, 40 
page directory base register (CR3), 169 
Control transfer restrictions, 178-183 
Conversion instructions, 61, 62 
flags, effects on (none), 80 
Coprocessors, 250-255, 315-328 
addresses, 250 
bus cycles, 253 
characteristics, 250 
80287/80387 recognition, 254 
exceptions, 255 
instructions, 250 
instruction sets, 319-328 
mterfaces, 250-253 
performance, 315 
recognition, 254 
signals, 241 
task switch, 200, 255 
Coprocessor signals, 241 
Copy (MOV) instruction, 54 
Count (ECX) register, 34, 59, 71, 101 
CPL (current privilege level), 178, 183 
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D Descnptors, 161-167 
creation, 183-184 

D (default operand/address size) bit, 75, 163 formats, 299-302 
D (dirty) bit, 172 gate, 179-180 
DAA instruction, 66, 109 summary, 299-303 
DAS instruction, 66 tables, 302-303 
Data breakpoints, 226, 228 task state segment, 195-196 
Data bus sizing, 231, 234, 235 types, 299 
Data/Control (D/C#) signal, 236 Descriptor tables, 163-164 
Data manipulation instructions, 58-59, 63-70 Destination index (DI) register, 35 
Data register, 34 DF (direction flag), 38, 63 
Data segment descriptor, 300 Direct addressing, 45, 46 
Data segment registers, 38, 40 Direction flag (DF), 38, 63 
Data size, 56, 75 Directives (assembler), 81, 84 
Data storage, order of bytes, 44 Direct-mapped caches, 261, 264-265 
Data structures, 109-111 Dirty (D) bit in page table entnes, 172 
Data transfer instructions, 54-58, 61-62 setting, 173 

flags, effect on (none), 79 Disabling interrupts, 140 

frequently used, 54-58 Displacement, 45, 46, 90 

general, 61-62 Divide errors, 219, 220 
Data types, 42-45 DMaA signals, 242 

names, 42 DMA transfers, 122 

storage, 44-45 Domain restrictions, 178 
DB directive, 81, 84 Double faults, 220-222 
DD directive, 81, 84 Double-length shifts, 88, 92, 94, 96 
Deadlock, 27 program example, 96 
Debug exceptions, 219, 229 Double word, 42 
Debugging features, 226-227 DPL (descriptor privilege level), 162, 178, 183, 
Debug registers, 226, 227-229 195 

address registers, 226, 228 DQ directive, 81 

control register, 228 DRAM, 259 

Status register, 228 DRAM controller, 258, 259 
Decimal arithmetic instructions, 66, 109 DT directive, 81 
DEC instruction, 79, 90, 92 Dump utility, 45 

bit manipulation, 92 DUP operator, 84 
Decisionmaking, 96-101 Dword, 42 
Default address/operand size bit, 75, 163 DWORD override, 101 
Decoding, 8 DWORD PTR operator, 56, 90 
Demand-paged system, 23 DW directive, 81, 84 
Departmental computing, 11 Dynamic data bus sizing, 231, 235 
Descriptor privilege level (DPL), 162, 178, 183 Dynamic RAM, 259 

task gate, 197-199 Dynamic RAM controller, 258, 259 


task state segment descriptors, 195 


CO 
CO 
O* 


E 


E (expansion-direction) bit, 176-178 
Early-out multiplication algorithm, 79, 88 
Effective addresses, 45, 47, 56 
EFLAGS register, 37-38. 
See also flags. 
8-bit registers, 36-37 
8086/8088 comparison, 29-30, 80-81, 83-84 
80286 differences, 80-81, 83 
memory capacity, 30 
real mode, 309-311 
register restrictions, 37 
speed, 30 
virtual 8086 mode, 312-313 
80286 comparison, 29-30, 80-81, 83 
memory capacity, 30 
new instructions, 83 
porting programs, 306 
protected mode, 305-308 
real mode, 308 
register restrictions, 37 
speed, 30 
80287 numeric coprocessor, 250, 315-317 
instruction set, 319-323 
interface, 250-251 
recognition, 254 
80387 numeric coprocessor, 241, 315-317 
instruction set, 325-328 
interface, 253 
read cycles, 253 
recognition, 254 
8237 DMA controller, 149 
8250 ACE, 127-129 
8253 PIT, 132-133 
8254 PIT, 132-133 
8255 PPI, 127, 130-132 
8259 Programmable Interrupt Controller (PIC), 
143-147 
EOI command, 144, 147 
features, 143-144 
initialization sequence, 144 
82258 Advanced Direct Memory Access 


Index 


Coprocessor (ADMA), 150 
82384 Clock Generator, 239 
82385 Cache Controller, 261, 265, 268 
EM (emulate coprocessor) bit, 254 
Emulating coprocessor instructions, 254, 255 
Enabling interrupts, 140 
END directive, 81 
ENTER instruction, 116 
Entry point restrictions, 175, 178-183 
EQU directive, 81, 84 
ERROR# signal, 241, 254 
coprocessor recognition, 254 
exceptions, 255 
Errors, programming, 118-119 
ESC instruction, 71, 250, 253, 255 
ET (extension type) bit, 239, 254 
EVEN directive, 117 
Even parity, 38 
Exception error codes, 217-219 
Exception handlers, 183 
conforming segments, 183 
gates, 217-218 
privilege level, 217 
task implementations, 199 
Exceptions, 137, 211-226 
classes, 213-215 
conditions, 219-220 
coprocessor, 255 
new 80386 features, 212 
sources, 212-214 
types, 212-213 
Exchanging elements, 102 
Executable segment descriptor, 300 
Execution unit, 8 
Expand-down segments, 176-178 
Expansion-direction (E) bit, 176-178 
Extended registers, 34-36 
External signals, 232-242 
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Faster programs, 117-118 
Faults, 213-214 
File server, 13 
Filling a pipeline, & 
Filling memory, 113-114 
Flag cross-reference, 295-296 
Flags, 37-38, 295-298 
compatibility, 37 
cross-reference, 295-296 
diagram, 37 
functions, 297 
instructions, effects of, 79-80 
major flags, 37-38 
summary, 297-298 
FLAGS register, 37 
Flag summary, 297-298 
Floating point (IEEE 754) numbers, 42-43, 250 
Flushing page cache, 173 
Foreground tasks, 204 
Frame, 22, 116 
Frame pointer, 116 
Framing, 129 
Frequently used instructions, 48, 54-62 
aritlunetic and logical instructions, 58-59 
data transfer instructions, 54, 56-58 
program control instructions, 60-61 
Fully associative caches, 261, 262-263 
Functional units. 7-8 
Future advances, 30-31 
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G (granularity) bit, 163 

Gate descriptors, 179-180, 301 
not-present bit, 223 

GB, 2, 3 

Generalized architecture, 5 

Gigabit, 42 

Gigabyte, 2, 3 

Global descriptor table, 163, 302 
shared memory, 202 
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Global descriptor table register, 163 
GOTOs, 9 

Granularity (G) bit, 163 

Graphics, 14-15 
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Handshaking signals, 132 

Head pointer, 142 

Hex digit conversion, 113 

High-level languages, 29 
features required, 29 
instructions, 77 

Hit (in a cache), 260 

Hit ratio, 260 

HLDA signal, 242 

HOLD signal, 242 
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IDT register, 139-140 
IEEE 754 (floating point) representation, 42-43, 
250 
IF (interrupt enable flag), 38, 137 
Image processing, 19-20 
Immediate addressing, 45, 46 
INC instruction, 90, 98 
bit manipulation, 92 
Increment with carry, 79 
Index, 45 
Index addressing, 45, 46 
Indirect jumps, 60, 104 
IN (input) instruction, 56-57, 124-125, 236 
Initial state of processor after RESET, 239-241 
INS instruction, 105, 125 
Instruction breakpoints, 226-229 
Instruction continuation, 214 
Instruction deccde unit, 8 
Instruction execution unit, 8 
Instruction fetch, & 
Instruction pointer, 36, 37 
interrupt descriptor table, 139 
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Instruction prefetch queue, 8, 185 
Instruction restart, 23, 214 
Instruction resumption, 23 
Instruction set, 48-74, 272-293 
address length, 75, 79 
alphabetical listing, 49-53 
arithmetic and logical instructions, 58-59, 
63, 66-67 
clock count summary, 272-285 
data manipulation instructions, 58-59, 63, 
66-67 
data transfer instructions, 54, 56-58, 61-62 
encoding, 286-293 
frequently used, 48, 54-61 
general, 61-74 
listing, 49-53 
operand length, 75, 79 
other, 71 
program control instructions, 60-61, 71, 
72-74 
restrictions, 175, 184 
Integers, 42 
Interleaved memory, 259 
Interrupt acknowledge cycles, 236, 242, 247-249 
Intenupt controller, 143-147 
Interrupt descriptor table, 139-140, 214-217, 303 
default values, 139-140 
Interrupt descriptor table (IDT) register, 139-140 
Interrupt-driven I/O, 122 
Interrupt enable flag (IF), 38, 137 
Interrupt gates, 216-217 
Interupt latency, 172 
Interrupt priority, 143, 214 
Interrupt-related instructions, 140 
Interrupts, 137-147 
inputs, 137 
priority, 214 
response, 60-61, 138-139 
Interrupt service routine examples, 140-143 
[Interrupt tasks, 199 
[Interrupt type, 137, 8-15 
Interrupt vectors, 139, 143, 267 
NT (software interrupt) instruction, 60-61 
NTO instruction, 219, 220 


INTR input, 137, 242 
INT 3 instruction, 220 
Invalid operation code exception, 220 
Invalid task state segment faults, 223 
Inverting bits, 91,92 
I/O addresses, 56, 122-124 
capacity, 56 
instructions, 124-125 
isolated, 122-124 
memory-mapped, 122, 124, 236 
non-segmented, 56, 122 
reserved, 57, 124 
I/O address register (DX), 124, 125 
I/O devices, definition of, 236 
I/O examples, 133, 136 
I/O guard map, 204 
I/O instructions, 124-125, 236 
I/O methods, 121-122 
I/O permission bit maps, 203-205 
characteristics, 205 
example, 205 
uses, 203-204 
IOPL, 203 
V86 mode, 203 
I/O privilege levels, 203 
IRET instruction, 140, 141, 217 
Isolated input/output, 56, 122-124 
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JECXZ inswruction, 71, 101 
JMP instruction, 60 
Jumps, 9, 60 
elimination, 117 
indirection, 60 
Jump table, 104 
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Keyboard decoding, 104 
Keyboard operations on IBM PC, 61 
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r. Megabyte, 3 
Memory banks, 232, 259 
Large memory model, 159 Memory boards, 3 
LAR instruction, 166, 225 Memory capacity, 2-3 
LDT (local descriptor table) register, 163 comparison to 80286, 2 
LEA instruction, 58 comparison to 8086, 2 
arithmetic uses, 58 expansion, 3 
Least recently used (LRU) algorithm, 23 Intel processors, 2, 4 
LEAVE instruction, 117 virtual, 30 
LIDT instruction, 140 Memory chips, 3 
Limit checking, 102-103, 175 Memory interface, 255-259 
Linear address, 161, 167 Memory/lO (M/IO#) signal, 236 
division for paging, 168 Memory management systems, 153-186 
Linked lists, 109-110 initialization, 184-185 
List manipulation, 109-110, 114 Memory management unit (MMU), 21, 154-155 
Local descriptor table, 163, 302 80386, 21 
shared memory, 202 on-chip vs. separate, 155 
Local descriptor table register (LDTR), 163 Memory-mapped I/O, 122, 124, 236 
LOCK prefix, 71, 307, 308, 310 non-cacheable, 267-268 
LOCK# signal, 237-238 Memory models, 159 
Logical addresses, 8 Memory-to-memory operations, 56, 66, 90 
Logical instructions, 91-92 Memory transfer control signals, 232-239 
Carry, effect on, 38, 79 Memory units, 3 
Long shifts, 5 Misaligned transfers, 235, 236 
Lookup tables, 61, 103-104, 118 Miss, 260 
Looping, 101 MMU, 21, 154-155 
LOOP inswruction, 71, 101 Mnemonics, alternative, 60 
Low byte first storage, 44 Most significant bit, 38 
LRU (least recently used) algorithm, 23 Motorola 68000 family, 104, 155, 214 
LSL inswuction, 166, 225 MOV (move) instructions, 54, 56, 89, 90 
LSS instruction, 226 addressing modes, 56 
LTR instruction, 196-197, 208 debug registers, 227-228 


exchange, 61 
order of operands, 54 


M MOVS instruction, 63, 66, 106 
MP (math present) bit, 255 
Machine language, 5 MS-DOS entry points, 61 
Machine state, 25 MS-DOS software, 3-4 
Macro Assembler notation, 34 80286 protected mode, 4, 167 
Mailbox, 141 Multibyte array elements, 46, 48, 102, 103 
Maximum value, 111-112 Multiple-bit shifts, 79 
MB, 3 Multiple-precision arithmetic, 108-109 


Multiplication methods, 79 


Multiplication using LEA, 58 
Multitasking, 4, 25-27, 188-190 
advantages, 26-27 

applications, 4-5 

arbitration, 27 
background, 204 
controller, 26 
disadvantages, 27 
foreground, 204 
functions, 25 
I/O, 203-205 
I/O-bound, 26, 27 
personal computer, 189 
priority, 25 

Multiuser systems, 11, 13, 27-28 
administration, 28 
I/O, 204 
problems, 28 
uses, 27-28 

Multiword shifts, 94, 96 
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NA# signal, 238, 247 
Negated state, 232 
Nested task (NT) flag, 190, 199, 201 
Next address request (NA#) signal, 238, 247 
NMI input, 137, 242 
Non-busy task, 201 
Non-cacheable memory, 24, 267-268 
Nonexistent address, 247 
Nonmaskable interrupt, 137, 242 
vector, 220, 242 

Non-overlapping segments, 159-160 
Non-pipelined bus cycles, 242-247 
Notation, macroassembler, 34 
NOT instruction, 79 
Not-present descriptor, 302 
NT (nested task) flag, 190, 199, 201 
Null selector, 165, 166 
Numeric coprocessor, 250-255. 

See also coprocessors. 
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OF (overflow flag), 37 
OFFSET operator, 84 
Offsets, 42, 157, 168 
Operand size, 75, 79, 89 
Operand size override, 75, 79 
Operating modes, 155-156 
Operating system support, 29 
OS restrictions, avoiding of, 180-181 
OS/2, 4, 30, 80 
OUT instruction, 56-57, 124-125, 236 
Output service routines, 141-142 
OUTS instruction, 106, 125 
Overlays, 21 
Overflow, 100 
Overflow flag (OF), 37 
Overrides: 
address/operand, 75, 79 
segment, 158-159 
Overrun, 129 
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P (segment present) bit, 162 
Packed BCD, 42 
instructions, 66 
Page, 22 
Page cache, 173-174 
Page directory, 167-168 
Page directory base register (CR3), 169 
Page fault, 21, 225-226 
error code, 218 
example, 22-23 
invalid stack pointer, 226 
meaning, 225 
multiple faults, 23 
task switches, 225 
Page frame, 22, 170 
Page present (P) bit, 172 
Page protection, 175-176 
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Page size, 24 
Page table, 22-23, 167-172 
elements, 171 
format, 171-172 
size, 171 
Starting address, 172 
Page unit, 8 
Paging, 22-25, 167-174 
after segmentation, 167 
demand-paged system, 23 
disadvantages, 24 
80386 support, 154 
example, 22-23 
page size, 24 
translation, 167-171] 
working set, 23 
Paging (PG) bit, 167, 185 
Paging unit, 8 
Parameter count in task gates, 181 
Parameter passing, 115-117 
Parity flag (PF), 38 
Parsing command lines, 107 
Pattern match, 113 
PCs, 11-14 
PE (protection enable) bit, 155, 184, 185 
PEREQ signal, 241, 250 
Performance comparisons, 30 
Personal computers, 11-14 
Personal System/2 computers (IBM), 1, 2, 16 
PF (parity flag), 38 
PG (paging) bit, 167, 185 
Physical addresses, 8 
Pipelined bus cycles, 247 
Pipelining, 6-9 
example, 8 
operation, 8 
programming techniques, 8-9 
Plant conwoller, 26-27 
PL/M function interface, 115 
Pointer, 42 
Polling, 122 
POP instruction, 57-58 
POPAD instruction, 57 
Power users, 1] 


Precharge time, 259 
Prefetch unit, 8 
Privileged instrcutions, 184 
Privilege levels, 175-176 
changing, 181, 182 
conforming code segments, 183 
descriptors, 175 
exception handlers, 217 
pages, 175-176. 
user/supervisor systems, 175-176 
V86 mode, 191 
Process, 188 
Processor control instnictions, 71, 78 
Processor-detected exceptions, 212-214 
Program control instructions, 60-61, 71, 72-74 
frequently used, 60-61 
general, /1, 72-74 
Program counter, 36, 37 
Programmable [/O chips, 125-133 
advantages, 125-126 
disadvantages, 126-127 
Programmed exceptions, 213 
Programmed I/O, 121-122 
Programming errors, common, 118-119 
Protection, 28-29, 174-184 
Protection enable (PE) bit, 155, 184, 185 
Protection exceptions, 176, 224-225 
Protection model instructions, 71, 78 
Pseudo-operations, 81, 84 
PS/2 computers, 1, 2, 16 
PTR (pointer) operator, 56, 84, 90 
PUSH instruction, 57-58 
PUSHAD instruction, 58 
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Quad word, 42 
storage example, 45 
? (undefined initial value), 81, 84 


Queues, 8 
Qword, 42 
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RCL instruction, 93, 94 
RCR instruction, 93, 94 
Readable (R) bit, 176 
Read-modify-write sequence, 238 
Read/write (R/W) bit, 172 
READY# signal, 238-239, 246 
Real mode, 6, 155-156 

differences from 8086, 309-311 

differences from 80286, 308 
Real-time systems, 24, 204 
Reduced-instruction-set (RISC) machines, 31, 48 
Reentrancy (of tasks), 195 
Refresh, 259 
Register addressing, 45, 46 
Register indirect addressing, 45, 46 
Registers, 34-40 

byte-addressable, 36-37 

clearing, 59 

control, 40 

debug, 40, 227-228 

8-bit, 36-37 

flags, 37-38 

general-purpose, 34-37 

limitations (8086/80286), 37 

limitations (80386), 48 

segment, 38, 39, 40 

setting flags from, 79, 98 

16-bit, 36 

specialized, 40 

special uses, 88 

system address, 39, 40 

test, 40 

user, 34-38 

word-addressable, 36 
Removing entries from a list, 114 
REP (repeat) prefix, 63, 106 

conditional versions, 63, 107, 112-114 

examples, 66, 107, 112-114 

TO, 125 

limitations, 63 

order of steps, 63 

parsing command lines, 107 


Reserved intempts, 137-138 
RESET signal, 239-241 
coprocessor, 254 
Initial state, 239, 240 
Startup address, 239 
Restart, 23 
Resume flag (RF), 227 
RET instruction to change privilege levels, 182 
RF (resume flag), 227 
RISC machines, 31, 48 
Robotics, 17-18 
ROL instruction, 93, 94 
ROR instruction, 94 
RPL (requestor's privilege level), 178 
RS-232 interface, 127 
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SAL instruction, 94, 95 
SAR instruction, 94, 95 
SBB instruction, 59 
Scaled index in addressing, 46 
Scaled indexing, 102, 103, 104 
Scientific users, 11 
Scientific workstations, 11 
Segmentation methods, 157-167 
8086, 157-161 
protected mode, 161-167 
Segmentation unit, 8 
Segment descriptor, 161-163 
Segment limit, 161 
Segment not present exceptions, 223-224 
Segment overrides, 158-159 
not allowed, 159 
Segment present (P) bit, 162 
Segment registers, 38-40, 157 
default assignments, 158 
80386 additions, 40 
hidden parts, 166 
length, 38 
overrides, 158-159 
validating, 224 
Segments, 5, 21 
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characteristics, 157 
size limit, 30, 154 
virtual memory, 21 
Segment type, 161 
Segment unit, 8 
Selector, 161, 164-167 
examples, 164-165 
format, 164 
null, 165 
Self-modifying code, 176 
Semaphores, 238 
Sequential operation, 8 
Set associative caches, 262, 265 
SET cc instruction, 71, 113 
Setting bits, 91-92 
SF (sign flag), 38 
Shared facilities, 28 
Shared memory, 202, 238 
Shift counts, 59 
Shift instructions, 59, 79, 92-96 
SHL instruction, 94, 95 
SHLD instruction, 94, 96 
SHR inswniction, 94, 95 
SHRD instruction, 94, 96 
Signal pin summary, 233-234 
Signal processing, 19-20 
tasks, 19-20 
typical applications, 20 
Signed conditional jumps, 100 
Sign-extended conversions, 61, 62, 115 
Sign extension, 94 
Sign flag (SF), 38 
16-bit registers, 36 
Slow memory, interfacing of, 238-239 
Small memory model, 159 
Snooping, 268 
Software interrupt (INT instrucaon), 60-61, 213 
220 
Software state, 192 
Source index (SJ) register, 36 
Speeding up programs, 117-118 
Stack-based parameter passing, 116-117 
Stack cache, 31 
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Stack exceptions, 224 
Stack pointer, 35 
limitation on use, 46 
subroutines, 116-117 
Stacks (software), 110-111 
Stack segment (SS) register, 38, 224, 226 
Stale data, 265 
Standard buses, 235-236 
Startup address, 239 
Startup signals, 239-24] 
Step size in string instructions, 66 
STI instruction, 140 
Stop bit, 127 
STR instruction, 197 
String, 42 
String comparison, 113 
String instructions, 63, 66, 68 
examples, 66, 105-107 
flag effects, 80 
list, 66, 105-106 
REP prefix, 63, 106-107 
step size, 63 
step direction, 63 
String length, 112-113 
String manipulation, 63, 66, 68, 104-107 
String primitives, 63, 105-106 
Subdepartmental computing, 11, 13 
SUB instruction, 58-59, 79 
Supervisor level, 172, 175-176 
Suspending a task, 189 
Swapping, 23-24 
example, 22-23 
problems, 24 
Switching modes, 6 
Systern address and segment registers, 39 
System Builder (BLD386) program, 183-184, 
205-208 
System segment descriptor, 300 
System-wide resources, 200 
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Table lookup, 61, 103-104, 118 

Tags, 173, 260, 261 
length in caches, 265 

Tail pointer, 142 

Task address spaces, 202 

Task context, 188 

Task gate descriptors, 197-199 

Task gates, 197-199 

Tasking, 25-27, 188-190 
advantages, 190 
80386 features, 190-199 
examples, 188-189 
functionalization, 27 
initialization, 205-208 
interrupt service routines, 199 
priority, 189 
reentrancy, 195 
requirements, 25 
V86 mode, 191, 208 

Task isolation, 27, 190 

Task linking, 201-202 

Task management, 187-209 

Task priority, 189 

Task register, 196-197 

Task state, 188, 199, 200 


Task state segment (TSS) descriptors, 195-196, 
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initialization, 205-208 

Task state segments, 192-195 
dummy, 207-208 
dynamic information, 192 
exceptions, 222, 223 
extended version, 192, 193, 195 
initialization, 205-208 
minimum, 192 
reading, 195 
software state, 192 
static information, 192 
validation, 224 
V86 version, 208 
writing, 195 

Task switch, 25, 26, 199-200 


advantages, 200 
coprocessor, 200, 255 
disadvantages, 200 
instructions causing, 200 
privilege levels, 200 
procedure, 199-200 
Terabyte (TB), 3 
Testing bits, 91, 97 
Testing memory, 98 
TEST instruction, 59, 90, 97, 98 
setting flags from register value, 79, 98 
TF (trap flag), 38, 217, 227 
Thrashing, 23 
TI (table indicator) bit in selector, 164 
Timing diagrams, 242-249 
TLB, 173-174 
Translation lookaside buffer (TLB ), 173-174 
entry format, 173 
flushing, 173 
memory range, 173 
Transparent routines, 140 
Trap bit (of a task state segment), 227 
Trap flag (TF), 38, 217, 227 
Trap gates, 215, 216 
Traps, 213-214 
TS (task switched) bit, 200 
Two’s complement overflow, 100 
Type checking, 175 
Type conversions, 61 
Type definitions, 75 
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UART, 126, 127-129 
Unbuffered I/O, 133, 136, 140-141 
Unix operating system, 17, 188, 192, 195 
Unmarked numbers, 46, 90 
Unpacked BCD, 42 

instructions, 66 
Unsigned conditional jumps, 60, 99-100 
USE directive, 75 
User level (page privilege), 175-176 
User segments, 175-176 
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User/supervisor (U/S) bit, 172 
User/supervisor systems, 175-176, 203 
Utility program, 183 
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Validating a TSS, 224 
V86 mode, 6, 153, 156 
invalid exit, 225 
I/O permission maps, 204 
privilege level, 191 
tasking, 191 
task initialization, 208 
Virtual devices, 203 
Virtual 8086 (V86) mode, 6, 153, 156 
advantages, 6 
differences from 8086, 312-313 
IOPL, 203 
Virtual 8086 monitor, 6, 154, 191, 203 
Virtual machine, 6 
Virtual memory, 5, 21-25 
advantages, 21 
capacity, 30 
implementations, 21-22, 162 
paged, 22-25 
segmented, 21 
Virtual Mode (VM) flag, 156, 191 
Visible part of a segment register, 166 
VM (virtual mode) flag, 156, 191 


W 
Wait states, 239, 242, 249 
Watchdog timer, 247 
Weitek floating point chip set, 250 
What-if analysis, 16-17 
Word, 42 

storage example, 44-45 
Word-length registers, 36 
WORD PTR operator, 56 
Working set, 23, 173 
Workstations, 11-17 


CAD/CAM/CAE, 14-17 

scientific, 11 

speed comparisons (80286/80386), 30 
Wraparound, 142 
Writable (W) bit, 176 
Write-back updating, 266-267 
Write/Read (W/R#) signal, 236 
Write-through updating, 266-267 
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XCHG instruction, 61, 102 

XLAT instruction, 61, 103 
segment override, 159 

XOR instruction, 59, 92, 115 
clearing registers, 59 
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Zero-extended conversions, 61 

Zero flag (ZF), 38 
bit scan instructions, 80 
carry from INC, 90 
string instructions, 107 
use, 38 


Zero in front of hex numbers, 34 
Zero iterations case, 101 


Zero (null) selector, 165, 166 


Trademarks 


Unix is aregistered trademark of AT&T. 
Sidekick is a registered trademark of Borland International, Inc. 
System V/386 is a trademark of Intel Corporation. 


VP/ix is a trademark of INTERACTIVE Systems Corporation and of Phoenix 
Technologies, Ltd. 


Locus is a registered trademark and Merge 386 is a trademark of Locus Comput- 
ing Corporation. 


Lotus and 1-2-3 are registered trademarks of Lotus Development Corporation. 


Microsoft, MS-DOS, and XENIX V/386 are registered trademarks of Microsoft 
Corporation. 


Prokey and RoseSoft are trademarks of RoseSoft, Inc. 





mmggue Ma 








About the 
Author 


Lance A. Leventhal is an independent consultant specializing in microprocessors 
and personal computers. He has his own finn, Emulative Systems Company, in San 
Diego, CA. He has helped develop many microprocessor-based systems, including 
communications controllers, navigation systems, signal processors, and instruments. 
He has served as a consultant for Rockwell International, Anderson-Jacobson, NCR, 
NASA, Disney, and Universities Space Research Association. 


Dr. Leventhal’s previous experience includes affiliations with Linkabit Corpora- 
tion, Intelcom Rad Tech, Naval Electronics Laboratory Center, and Harry Diamond 
Laboratories. He received a B.A. degree from Washington University (St. Louis, 
MO) and M.S. and Ph.D. degrees from the University of Califormia, San Diego. He 
is amember of SCS, ACM, IEEE, IEEE Computer Society, and ASEE. 


349 


Also in The 
PC Library 


The PC Configuration Handbook: A Complete Guide to Assembling, Enhancing, 
and Maintaining Your PC, by John Woram. Covers IBM PCs, ATs and Com- 
patibles. 


Master the Powerful World of the 80386! 


Lance Leventhal’s 80386 Programming Guide is a general reference manual 
for programmers, technicians, systems analysts, teachers, students, and hackers. 
Noted author, Lance Leventhal guides you through the 80386 step-by-step from 
its advanced 32-bit architecture to its most powerful multi-tasking functions. 


Regardless of whether you use an 80386-based system in technical, industrial, or 
business applications, this invaluable sourcebook provides an in-depth, easy-to- 
understand discussion of all major aspects of 80386 software development. 


The discussion includes: 

# A task-oriented overview of the 80386’s instruction set and assembly language. 
# An explanation of I/O methods and descriptions of the common I/O chips. 

= Thorough inspection of the 80386’s memory and task management facilities. 
# A functional description of the 80386 hardware. 


For the reader with some exposure to the 8088 family and assembly language 
programming, this book explains the 80386’s key features and the differences 
between it and earlier chips. It provides a basic understanding of how the 80386 
works and what its capabilities will mean to you 1n real applications. 


Lance Leventhal is the author of over 20 computer books with more than 
700,000 copies in print. He is best known for his very popular series of chip 
books — on the 8080, 6800, 6809, and 68000 microprocessors — of which this 
book is a logical successor. 


| ““The 80386 makes personal computers come of age, and this book helps you to master all its 
| capabilities. The 80386 gives PCs the computing power and memory capacity of large 

- machines. [t will lead the way to bringing large database applications, financial models, 

| CAD/CAM, artificial intelligence, robotics, and signal and image processing to the desk- 
top.” —Lance A. Leventhal | 
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